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METHODS OF PRODUCING POLYKETIDE SYNTHASE MUTANTS 
AND COMPOSITIONS AND USES THEREOF 

FIELD OF THE INVENTION 

The present invention relates to methods for producing mutant polyketide 
5 synthases, and for altering the activity and/or substrate specificity of putative native 
and mutant polyketide synthases. The present invention further relates to 
compositions and uses of mutant polyketide synthases. 

BACKGROUND 

Advances in molecular biology have allowed the development of biological 
agents useful in modulating protein or nucleic acid activity or expression, 
respectively. Many of these advances are based on identifying the primary sequence 
of the molecule to be modulated. For example, determining the nucleic acid sequence 
of DNA or RNA allows the development of antisense or ribozyme molecules. 
Similarly, identifying the primary sequence allows for the identification of sequences 
that may be useful in creating monoclonal antibodies. However, often the primary 
sequence of a protein is insufficient to develop therapeutic or diagnostic molecules 
due to the secondary, tertiary or quartenary structure of the protein from which the 
primary sequence is obtained. The process of designing potent and specific inhibitors 
or activators has improved with the arrival of techniques for determining the three- 
dimensional structure of an enzyme or polypeptide to be modulated. 

The phenylpropanoid synthetic pathway in plants produces a class of 
compounds know as anthocyanins, which are used for a variety of applications. 
Anthocyanins are involved in pigmentation and protection against UV photodamage, 
synthesis of anti-microbial phytoalexins, and are flavonoid inducers of Rhizobium 
25 modulation genes 1-4. As medicinal natural products, the phenylpropanoids exhibit 
cancer chemopreventive activity, as well as anti-mitotic, estrogenic, anti-malarial, 
anti-oxidant, and antiasthmatic activities. The benefits of consuming red wine, which 
contains significant amounts of 3,4',5-trihydroxystilbene (resveratrol) and other 
phenylpropanoids, highlight the dietary importance of these compounds. Chalcone 
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synthase (CHS), a polyketide synthase, plays an essential role in the biosynthesis of 
plant phenylpropanoids. 

An improvement in the understanding of the structure/function of these 
enzymes would allow for the exploitation of the synthetic capabilities of known 
5 enzymes for production of useful new chemical compounds, or allow for the creation 
of novel non-native enzymes having new synthetic capabilities. A need exists, 
therefore, for a detailed understanding of the molecular basis of the chemical reactions 
involved in polyketide synthesis. The present invention addresses this and related 
needs. 

10 SUMMARY OF THE INVENTION 

In accordance with the present invention there are presented crystalline 
polyketide synthases and the three-dimensional coordinates derived therefrom. Three- 
dimensional coordinates have been obtained for an active form of chalcone synthase 
and several active and inactive mutants thereof, both with and without substrate or 
15 substrate analog. Similar results have been obtained for the polyketide synthases 
stilbene synthase (STS) and pyrone synthase (2-PS). 

One aspect of the present invention that is made possible by results described 
herein is that the three-dimensional properties of polyketide synthase proteins are 
determined, in particular the three-dimensional properties of the active site. The 

20 invention features specific coordinates of at least fourteen a carbon atoms defined for 
the active site in three-dimensional space. R-groups attached to said oc-carbons are 
defined such that mutants can be made by changing at least one R-group found in the 
synthase active site. Such mutants may have unique and useful properties. Thus, in 
another embodiment of the invention, there are provided isolated non-native (e.g., 

25 mutant) synthase(s) having at least fourteen active site oc-carbons having the structural 
coordinates disclosed herein (see, for example Table 1) and one or more R-groups 
other than those found in native polyketide synthase(s). 
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The three-dimensional coordinates disclosed herein can be employed in a 
variety of methods. The polyketide synthase used in the crystallization studies 
disclosed herein is a chalcone synthase derived from Medicago sataiva (alfalfa). A 
large number of proteins have been isolated and sequenced which have primary amino 
5 acid sequence similar to that of chalcone synthase, but for which substrate specificity 
and/or product is unknown. Thus, in another embodiment of the present invention, 
there are provided methods for altering the activity and/or substrate specificity of a 
putative polyketide synthase. There are further provided methods for altering the 
polyketide content of a plant. 

10 Other aspects, embodiments, advantages, and features of the present invention 

will become apparent from the following specification. 

BRIEF DESCRIPTION OF FIGURES 

Figure 1 presents the chemical structures of chalcone, naringenin, resveratrol, 
and cerulenin. 

15 Figure 2 presents final SIGMAA-weighted 2Fo-Fc electron density map of the 

CHS -resveratrol complex in the vicinity of the resveratrol binding site. The map is 
contoured at la. 

Figure 3 shows a ribbon representation of the CHS homodimer. The 
approximate alpha carbon positions of Met 137 from each of the monomers are 
20 labeled accordingly. Naringenin completely fills the coumaroyl-binding and 

cyclization pockets while the CoA binding tunnels are highlighted by black arrows. 
Produced with MOLSCRIPT and rendered with POV-Ray. 

Figure 4 shows a comparison of chalcone synthase and 3-ketoacyl-CoA 
thiolase. Ribbon view of the CHS monomer is oriented perpendicular to the dimer 
25 interface. The active site cysteine (Cys 164) and the location of bound CoA are 

rendered as ball and stick models. In addition, strands (3 Id and P2d of the cyclization 
pocket are noted. The reaction catalyzed by CHS is illustrated with the coumaroyl- 
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and malonyl-derived portions of chalcone, respectively. The thiolase monomer is 
depicted in the same orientation as CHS with the Active site cysteine (Cys 125) 
modeled and the reaction of thiolase as indicated. Figure prepared with MOLSCRIPT 
and rendered with PO V-Ray. 

5 Figure 5 collectively shows structures of CHS-Acyl-CoA complexes. The 

ribbon diagram in panel Figure 5 A (on the top left) is the same as Figure 3. The Co A 
binding region depicted in stereo is bounded by a black box in the upper ribbon 
diagram. Close-up stereoviews of the C164S mutant Co A binding region for the 
malonyl-and hexanoyl-CoA complexes are depicted in Figures 5B and 5C, 
10 respectively. This mutant retains decarboxylation activity and an acetyl-CoA complex 
is observed crystallographically for the malonyl-CoA complex. In each complex, 
placement of the Met 137 loop originating from the dyad-related molecule spatially 
defines one wall of the cyclization pocket. Hydrogen bonds are depicted as spheres. 
Figure prepared with MOLSCRIPT and rendered with POV-Ray. 

15 Figure 6 A shows the CHS-naringenin complex viewed down the CoA-binding 

tunnel. The ribbon diagram at the top left has been rotated 90 degrees around the y- 
axis from the orientation shown in Figure 3. This view approximates the global 
orientation of the CHS dimer used for the close-up view of the naringenin binding site 
depicted in stereo. Again, the black box highlights the region of CHS shown in stereo 

20 close-up. Hydrogen bonds are depicted as dashed cylinders. Figure 6B illustrates a 
comparison of the CHS apoenzyme, CHS-naringenin, and CHS-resveratrol structures. 
Protein backbone atoms for the three refined structures (apoenzyme, naringenin, and 
resveratrol) were superimposed by least squares fit in O. The position of bound 
naringenin and resveratrol are shown. For reference, a modeled low energy 

25 conformation of chalcone is indicated by dashed cylinders. Strands pid and p2d for 
each complex are also depicted (see Figure 4). p2d does not change in all the 
complexes examined, but pid moves in the CHS-resveratrol complex. Figure 6C 
presents representative sequence alignment of the pid -p2d region is given with 
positions 255, 266, and 268 highlighted. The first three sequences follow a CHS-like 
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cyclization pathway, while the last three use the STS-cyclization pathway. Figure 
prepared with MOLSCRIPT and rendered with POV-Ray. 

Figure 7 presents the proposed reaction mechanism for chalcone synthesis. 
The three boxed regions labeled 1, 2, and 3 depict the addition of acetate units derived 

5 from malonyl-CoA during the elongation of polyketide intermediates. Box 1 is 
depicted in expanded fashion to illustrate the mechanistic details governing the 
decarboxylation, enolization, and condensation phase of ketide elongation. Smaller 
black arrows depict the flow of electrons. Each acetate unit of the malonyl-CoA 
thioesters is coded to emphasize the portions of chalcone derived from each of three 

10 elongation reactions using malonyl-CoA. Cyclization and aromatization of the 

enzyme bound tetraketide leads to formation of chalcone. Hydrogen bonds are shown 
as dashed lines. Coenzyme A is symbolized as a circle. 

Figure 8 presents a comparison of the active site volumes of CHS from alfalfa 
and CHS from Gerbera hybrida. The active site volumes available for binding ketide 
15 intermediates were calculated with VOID00 for the CHS-COA complex and for a 
homology model of GCHS2 with CoA. The cavities are shown as a wire mesh. The 
homology model of GCHS2 was generated using MODELER and the volume 
calculated and displayed as for CHS. The numbering scheme is for alfalfa CHS 
homodimer. Figure prepared with MOLSCRIPT and rendered with POV-Ray. 

20 Figure 9 shows an example of a computer system in block diagram form. 

Figure 10 shows the chalcone synthase reaction sequence including initiation, 
elongation and cyclization. 

Figure 11 shows an amino acid sequence alignment of P. sylvestris STS and 
M sativa CHS, along with an evolutionary intermediate, P. sylvestris CHS. 

25 Figure 12 shows phenylpropanoid metabolism. From a common linear 

phenylpropanoid tetraketide intermediate, resveratrol is formed by STS and chalcone 
is formed by CHS. 
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Figure 13 shows different reaction schemes of CHS and STS. STS forms 
resveratrol via an intramolecular aldol condensation and CHS utilizes an 
intramolecular Claisen condensation to produce chalcone. 

Figure 14 shows an autoradiographic gel following thin layer chromatography. 
5 Wild type CHS produces chalcone, which spontaneously converts to naringenin, the 
position of which is indicated by the arrow on the left. Wild type STS produces 
resveratrol, the position of which is indicated by the arrow of the right. Function 
conversion of CHS to STS (i.e., the production of the alternate product from the same 
intermediate) results in diminished production of naringenin and increased production 
10 of resveratrol. Various mutants of CHS produce varying degrees of resveratrol, 

showing that CHS activity can be altered to STS-like activity to different extents by 
different mutations. 

Figure 15 shows the crystalline structure of CHS. Circled areas Al to A4 
represent regions in which mutations result in the conversion of CHS activity to STS- 
15 like activity. The 18xCHS mutant contains mutations in these regions. 

Figure 16 shows the crystalline structure of CHS with area Bl, mutated in the 
22xCHS mutant circled. 

Figure 17 shows amino acid sequences of homologous sequences from STS 
family members. 

20 Figure 18 shows the kinetics of the 18xCHS in comparison to the wild type 

CHS and STS. 

Figure 19 shows a comparison of the crystal structures of the wild type CHS 
(alfalfa), two types of STS (pine and peanut) and the 1 8xCHS mutant. Areas Al to 
A4 are as indicated in Figure 14. A comparison of the amino acid sequence in these 
25 areas is also provided. The stars indicated the residues mutated in the 8xCHS mutant. 



Figure 20 shows that the 8xCHS mutant has activity that is similar to the 
18xCHS mutant, i.e. an alteration of the CHS activity to an STS-like activity. The 
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8xCHS mutant contains five mutations in Area A2 and three additional changes in 
Areas Al and A3. The mutations in the 8xCHS are a subset of the mutations in the 
18xCHS mutant, eliminating 10 neutral mutations found in the 18xCHS mutant. 

Figure 21 shows the proposed mechanism of cyclization specificity in STS as 
5 compared to CHS, which results in the different end-products. STS elimination of 
terminal C02 favors intramolecular C2 to C7 Aldol Condensation, while CHS causes 
intramolecullar C6 to CI Claisen Condensation coupled to thioester cleavage. 

Figure 22 shows the aldol cyclization switch region as viewed from the CoA- 
ginding tunnel, involved in the mechanisms depicted in Figure 21. 

10 DETAILED DESCRIPTION OF THE INVENTION 

The phenylpropanoid synthetic pathway in plants produces a class of 
compounds know as anthocyanins, which are used for a variety of applications. 
Anthocyanins are involved in pigmentation and protection against UV photodamage, 
synthesis of anti-microbial phytoalexins, and are flavonoid inducers of Rhizobium 
15 modulation genes 1-4. As medicinal natural products, the phenylpropanoids exhibit 
cancer chemopreventive activity, as well as anti-mitotic, estrogenic, anti-malarial, 
anti-oxidant, and antiasthmatic activities. The benefits of consuming red wine, which 
contains significant amounts of 3,4 ! ,5-trihydroxystilbene (resveratrol) and other 
phenylpropanoids, highlight the dietary importance of these compounds. 

20 Polyketides are a large class of compounds and include a broad range of 

antibiotics, immunosuppressants and anticancer agents which together account for 
sales of over $5 billion per year. Polyketides are molecules which are an extremely 
rich source of bioactivities, including antibiotics {e.g., tetracyclines and 
erythromycin), anti-cancer agents (e.g., daunomycin), immunosuppressants (e.g., 

25 FK506 and rapamycin), and veterinary products (e.g., monensin) and the like. Many 
polyketides (produced by polyketide synthases) are valuable as therapeutic agents. 
Polyketide synthases are multifunctional enzymes that catalyze the biosynthesis of a 
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huge variety of carbon chains differing in length and patterns of functionality and 
cyclization. 

Chalcone synthase (CHS), a polyketide synthase, plays an essential role in the 
biosynthesis of plant phenylpropanoids. CHS supplies 4,2 ! ,4 f ,6'-tetrahydroxychalcone 
5 (chalcone) to downstream enzymes that synthesize a diverse set of flavonoid 

phytoalexins and anthocyanin pigments. Synthesis of chalcone by CHS involves the 
sequential condensation of one p-coumaroyl- and three malonyl- Co enzyme- A (CoA) 
molecules (Kreuzaler and Hahlbrock, Eur. J. Biochem. 56:205-213, 1975). After 
initial capture of the p-coumaroyl moiety, each subsequent condensation step begins 
10 with decarboxylation of malonyl-CoA at the CHS active site; the resulting acetyl-CoA 
carbanion then serves as the nucleophile for chain elongation. 

Ultimately, these reactions generate a tetraketide intermediate that cyclizes by 
a Claisen condensation into a hydroxylated aromatic ring system. This mechanism 
mirrors those of the fatty acid and polyketide synthases but with significant 

15 differences. CHS uses CoA-thioesters for shuttling substrates and intermediate 

polyketides instead of the acyl carrier proteins used by the fatty acid synthases. Also, 
unlike these enzymes, which function as either multichain or multimodular enzyme 
complexes catalyzing distinct reactions at different active sites, CHS functions as a 
unimodular polyketide synthase and carries out a series of decarboxylation, 

20 condensation, cyclization, and aromatization reactions at a single active site. 

A number of plant and bacterial polyketide synthases related to CHS by 
sequence identity, including stilbene synthase (STS), bibenzyl synthase (BBS), and 
acridone synthase (ACS), share a common chemical mechanism, but differ from CHS 
in their substrate specificity and/or in the stereochemistry of the polyketide cyclization 
25 reaction. For example, STS condenses one coumaroyl- and three malonyl-CoA 
molecules, like CHS, but synthesizes resveratrol through a structurally distinct 
cyclization intermediate. 

While the cloning of over 400 CHS-related genes, and characterization of 
some of these proteins, provides insight into their biological function, it remains 
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unclear how these enzymes perform multiple decarboxylation and condensation 
reactions and how they dictate the stereochemistry of the final polyketide cyclization 
reaction. Furthermore, despite significant advances in the biosynthetic manipulation 
of structurally complex and biologically important natural products, there remains a 
5 lack of structural information on polyketide synthases from any source. 

As used herein, "naturally occurring amino acid" and "naturally occurring R- 
group" includes L-isomers of the twenty amino acids naturally occurring in proteins. 
Naturally occurring amino acids are glycine, alanine, valine, leucine, isoleucine, 
serine, methionine, threonine, phenylalanine, tyrosine, tryptophan, cysteine, proline, 
10 histidine, aspartic acid, asparagine, glutamic acid, glutamine, arginine, and lysine. 
Unless specially indicated, all amino acids referred to in this application are in the In- 
form. 

"Unnatural amino acid" and "unnatural R-group" includes amino acids that are 
not naturally found in proteins. Examples of unnatural amino acids included herein 
15 are racemic mixtures of selenocysteine and selenomethionine. In addition, unnatural 
amino acids include the D or L forms of, for example, nor-leucine, para- 
nitrophenylalanine, homophenylalanine, para-fluorophenylalanine, 3 -amino -2- 
benzylpropionic acid, homoarginines, D -phenylalanine, and the like. 

"R-group" refers to the substituent attached to the a-carbon of an amino acid 
20 residue. An R-group is an important determinant of the overall chemical character of 
an amino acid. There are twenty natural R- groups found in proteins, which make up 
the twenty naturally occurring amino acids. 

"a-carbon" refers to the chiral carbon atom found in an amino acid residue. 
Typically, four substituents will be covalently bound to said a-carbon including an 
25 amine group, a carboxylic acid group, a hydrogen atom, and an R-group. The a- 
carbon atoms can also be referred to by their crystal structure coordinates as a 
convenient reference point. Table 1 provides the structural coordinates of a-carbons 
found in the active site of a polyketide of the present invention. 
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TABLE 1 



Active Site -Carbon 
Number 


X Position 


Y Position 


Z Position 


Amino Acid 


1 


25.378 


49.320 


57.979 


Thr 132 


2 


26.089 


45.704 


56.981 


Ser 133 


3 


- 35.423 


42.296 


66.622 


Met 137* 


4 


25.212 


49.977 


62.196 


Gin 161 


5 


22.745 


44.120 


51.193 


Thr 194 


6 


19.022 


42.892 


54.600 


Thr 197 


7 


13.850 


48.144 


50.791 


Gly211 


8 


22.118 


48.048 


46.357 


Gly216 


9 


13.001 


54.666 


59.688 


He 254 


10 


16.434 


48.819 


61.334 


Gly256 


11 


18.715 


43.328 


59.526 


Leu 263 


12 


13.943 


47.516 


57.567 


Phe 265 


13 


9.252 


52.715 


57.456 


Leu 267 


14 


23.141 


53.552 


52.148 


Ser 338 



* Met 137 from the second monomer 



"Positively charged amino acid" and "positively charged R-group" includes 
any naturally occurring or unnatural amino acid having a side chain which is 
5 positively charged under normal physiological conditions. Examples of positively 
charged, naturally occurring amino acids include arginine, lysine, histidine, and the 
like. 

"Negatively charged amino acid" and "negatively charged R-group" includes 
any naturally occurring or unnatural amino acid having a side chain which is 
10 negatively charged under normal physiological conditions. Examples of negatively 
charged, naturally occurring amino acids include aspartic acid, glutamic acid, and the 
like. 

"Hydrophobic amino acid" and "hydrophobic R-group" includes any naturally 
occurring or unnatural amino acid that is relatively insoluble in water. Examples of 
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naturally occurring hydrophobic amino acids are alanine, leucine, isoleucine, valine, 
proline, phenylalanine, tryptophan, methionine, and the like. 

"Hydrophilic amino acid" and "hydrophilic R-group" includes any naturally 
occurring or unnatural amino acid that is relatively soluble in water. Examples of 
5 naturally occurring hydrophilic amino acids include serine, threonine, tyrosine, 
asparagine, glutamine, cysteine, and the like. 

"Mutant" or "mutated synthase" refers to a polyketide synthase polypeptide 
containing amino acid residues that have been substituted or modified with respect to 
a wild type polyketide synthase (for example, the alfalfa CHS having the crystal 
structure coordinates of Protein Data Bank (PDB)Accession No. 1BI5). Examples of 
mutant or mutated synthase polypeptides include those having PDB Accession Nos. 
1D6F, 1D6I, and 1D6H (the content of which are incorporated by reference herein in 
their entirety). Further examples of mutant or mutated synthase polypeptides are set 
forth in a set of crystal structure coordinates in Appendix C, the 18xCHS mutant. 
Access to the foregoing information in the Protein Data Bank can be found at 
www.rcsb.org/pdb. The Protein Data Bank is operated by the Research Collaboratory 
for Structural Bioinformatics (RCSB). 

The R-groups of known isolated polyketide synthases can be readily 
determined by consulting sequence databases well known in the art, such as, for 
20 example, Genbank. Additional R-groups found inside and/or outside of the active site 
may or may not be the same. R-groups may be a natural R-group, unnatural R-group, 
hydrophobic R-group, hydrophilic R-group, positively charged R-group, negatively 
charged R-group, and the like. The term "mutant" refers to the configuration of R- 
groups within the active site and/or groups involved in second-tier interactions, for 
25 example those resulting in the alteration of CHS native activity. 

"Non-native" or "non-native synthase" refers to synthase proteins that are not 
found in nature, whether isolated or not. A non-native synthase may, for example, be 
a mutated synthase (see, for example, PDB Accession Nos. 1D6F, 1D6I, 1D6H and 
Appendix C). 
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"Native" or "native synthase" or "wild type synthase" refers to synthase 
proteins that are produced in nature, e.g., are not mutants (see, for example, PDB 
Accession Nos. 1BI5 (CHS), 1EE0 (2-PS)). 

"Isolated" refers to a protein or nucleic acid that has been identified and 
5 separated from its natural environment. Contaminant components of its natural 
environment may include enzymes, hormones, and other proteinaceous or non- 
proteinaceous solutes. In one embodiment, the isolated molecule, in the case of a 
protein, will be purified to a degree sufficient to obtain at least 15 residues of N- 
terminal or internal amino acid sequence or to homogeneity by SDS-PAGE under 
10 reducing or non-reducing conditions using Coomassie blue or silver stain. In the case 
of a nucleic acid the isolated molecule will preferably be purified to a degree 
sufficient to obtain a nucleic acid sequence using standard sequencing methods. 

"Degenerate variations thereof refers to changing a gene sequence using the 
degenerate nature of the genetic code to encode proteins having the same amino acid 
15 sequence yet having a different gene sequence. For example, polyketide synthases of 
the present invention are based on amino acid sequences. Degenerate gene variations 
thereof can be made encoding the same protein due to the plasticity of the genetic 
code, as described herein. 

"Expression" refers to transcription of a gene or nucleic acid sequence, stable 
20 accumulation of nucleic acid, and the translation of that nucleic acid to a polypeptide 
sequence. Expression of genes also involves transcription of the gene to make RNA, 
processing of RNA into mRNA in eukaryotic systems, and translation of mRNA into 
proteins. It is not necessary for the genes to integrate into the genome of a cell in order 
to achieve expression. This definition in no way limits expression to a particular 
25 system or to being confined to cells or a particular cell type and is meant to include 
cellular, transient, in vitro, in vivo, and viral expression systems in both prokaryotic, 
eukaryotic cells, and the like. 

"Foreign" or "heterologous" genes refers to a gene encoding a protein whose 
exact amino acid sequence is not normally found in the host cell. 
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"Promoter 5 ' and "promoter regulatory element", and the like, refers to a 
nucleotide sequence element within a nucleic acid fragment or gene that controls the 
expression of that gene. These can also include expression control sequences. 
Promoter regulatory elements, and the like, from a variety of sources can be used 
5 efficiently to promote gene expression. Promoter regulatory elements are meant to 
include constitutive, tissue-specific, developmental-specific, inducible, subgenomic 
promoters, and the like. Promoter regulatory elements may also include certain 
enhancer elements or silencing elements that improve or regulate transcriptional 
efficiency. Promoter regulatory elements are recognized by RNA polymerases, 
10 promote the binding thereof, and facilitate RNA transcription. 

A polypeptide is a chain of amino acids, regardless of length or post- 
translational modification (e.g., glycosylation or phosphorylation). A polypeptide or 
protein refers to a polymer in which the monomers are amino acid residues, which are 
joined together through amide bonds. When the amino acids are alpha-ammo acids, 

15 either the L-optical isomer or the D-optical isomer can be used, the L-isomers being 

typical. A synthase polypeptide of the invention is intended to encompass an amino acid 
sequence as set forth in SEQ ID NO: 1 (see Table 2), or SEQ ID NO: 1 having one or 
more mutations. Mutations include deletions and additions of amino acid residues, and 
substitutions of one amino acid residue for another. For example substitutions include: 

20 D96A (where D at position 96 of a wild type CHS is changed toA), V98L, V99A, 

V100M, T131S, S133T, G134T, V135P, M137L, Y157V, M158G, M159V, Y160F, 
C164A, Q165H, D255G, H257K, L258V, H266Q, L268K, K269G, D270A, G273D, 
H303Q, N336A, mutants, variants and conservative substitutions thereof comprising L- 
or D-amino acids and include modified sequences such as glycoproteins. 

25 TABLE 2 (SEQ ID NO:l) 

MVSVSEIRKA QRAEGPATIL AIGTANPANC VEQSTYPDFY FKITNSEHKT ELKEKFQRMC 

DKSMIKRRYM YLTEEILKEN PNVCEYMAPS LDARQDMWV EVPRLGKEAA VKAIKEWGQP 

KSKITHLIVC TTSGVDMPGA DYQLTKLLGL RPYVKRYMMY QQGCFAGGTV LRLAKDLAEN 

NKGARVLVVC SEVTAVTFRG PSDTHLDSLV GQALFGDGAA ALIVGSDPVP EIEKPIFEMV 

30 WTAQTIAPDS EGAIDGHLRE AGLTFHLLKD VPGIVSKNIT KALVEAFE PL GISDYNSIFW 

IAHPGGPAIL DQVEQKLALK PEKMNATREV LSEYGNMSSA CVLFILDEMR KKSTQNGLKT 

TGEGLEWGVL FGFGPGLTIE TVVLRSVAI 
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Accordingly, the polypeptides of the invention are intended to cover naturally 
occurring proteins, as well as those which are recombinantly or synthetically 
synthesized. Polypeptide or protein fragments are also encompassed by the invention. 
Fragments can have the same or substantially the same amino acid sequence as the 
5 naturally occurring protein. A polypeptide or peptide having substantially the same 
sequence means that an amino acid sequence is largely, but not entirely, the same, but 
retains a functional activity of the sequence to which it is related. In general polypeptides 
of the invention include peptides, or full-length protein, that contains substitutions, 
deletions, or insertions into the protein backbone, that would still have an approximately 
10 70%-90% homology to the original protein over the corresponding portion. A yet 
greater degree of departure from homology is allowed if like-amino acids, i.e. 
conservative amino acid substitutions, do not count as a change in the sequence. 

A polypeptide may be substantially related but for a conservative variation, such 
polypeptides being encompassed by the invention. A conservative variation denotes the 

15 replacement of an amino acid residue by another, biologically similar residue. Examples 
of conservative variations include the substitution of one hydrophobic residue such as 
isoleucine, valine, leucine or methionine for another, or the substitution of one polar 
residue for another, such as the substitution of arginine for lysine, glutamic for aspartic 
acids, or glutamine for asparagine, and the like. Other illustrative examples of 

20 conservative substitutions include the changes of: alanine to serine; arginine to lysine; 
asparagine to glutamine or histidine; aspartate to glutamate; cysteine to serine; glutamine 
to asparagine; glutamate to aspartate; glycine to proline; histidine to asparagine or 
glutamine; isoleucine to leucine or valine; leucine to valine or isoleucine; lysine to 
arginine, glutamine, or glutamate; methionine to leucine or isoleucine; phenylalanine to 

25 tyrosine, leucine or methionine; serine to threonine; threonine to serine; tryptophan to 
tyrosine; tyrosine to tryptophan or phenylalanine; valine to isoleucine or leucine, and the 
like. The term "conservative variation" also includes the use of a substituted amino acid 
in place of an unsubstituted parent amino acid provided that antibodies raised to the 
substituted polypeptide also immunoreact with the unsubstituted polypeptide. 
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Modifications and substitutions are not limited to replacement of amino acids. 
For a variety of purposes, such as increased stability, solubility, or configuration 
concerns, one skilled in the art will recognize the need to introduce, (by deletion, 
replacement, or addition) other modifications. Examples of such other modifications 
5 include incorporation of rare amino acids, dextra-amino acids, glycosylation sites, 
cytosine for specific disulfide bridge formation. The modified peptides can be 
chemically synthesized, or the isolated gene can be site-directed mutagenized, or a 
synthetic gene can be synthesized and expressed in bacteria, yeast, baculovirus, tissue 
culture and so on. 

10 Chalcone synthase polypeptides of the invention include synthase polypeptides 

from plants, prokaryotes, eukaryotes, including, for example, invertebrates, mammals 
and humans and include sequences as set forth in SEQ ID NO:l, as well as sequences 
that have at least 50% homology, preferably at least 60% homology, more preferably at 
least 70% homology to the sequence of SEQ ID NO:l, fragments, variants, or 

15 conservative substitutions of any of the foregoing sequences. 

The term "variant" refers to polypeptides modified at one or more amino acid 
residues yet still retain the biological activity of a synthase polypeptide. Variants can 
be produced by any number of means known in the art, including, for example, 
methods such as, for example, error-prone PGR, shuffling, oligonucleotide-directed 
20 mutagenesis, assembly PGR, sexual PCR mutagenesis, and the like, as well as any 
combination thereof. 

By "substantially identical" is meant a polypeptide or nucleic acid exhibiting at 
least 50%, preferably 85%, more preferably 90%, and most preferably 95% homology 
to a reference amino acid or nucleic acid sequence. 

25 Sequence homology and identity are often measured using sequence analysis 

software (e.g. , Sequence Analysis Software Package of the Genetics Computer Group, 
University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, WI 
53705). The term "identity" in the context of two or more nucleic acids or polypeptide 
sequences, refers to two or more sequences or subsequences that are the same or have a 
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specified percentage of amino acid residues or nucleotides that are the same when 
compared and aligned for maximum correspondence over a comparison window or 
designated region as measured using any number of sequence comparison algorithms or 
by manual alignment and visual inspection. The term "homology" in the context of two 
5 or more nucleic acids or polypeptide sequences, refers to two or more sequences or 

subsequences that are homologous or have a specified percentage of amino acid residues 
or nucleotides that are homologous when compared and aligned for maximum 
correspondence over a comparison window or designated region as measured using any 
number of sequence comparison algorithms or by manual alignment and visual 
10 inspection. Programs as mentioned above allow for amino acid substitutions with 

similar amino acids matches by assigning degrees of homology to determine a degree of 
homology between the sequences being compared. 

For sequence comparison, typically one sequence acts as a reference sequence, to 
which test sequences are compared. When using a sequence comparison algorithm, test 
15 and reference sequences are entered into a computer, subsequence coordinates are 
designated, if necessary, and sequence algorithm program parameters are designated. 
Default program parameters can be used, or alternative parameters can be designated. 
The sequence comparison algorithm then calculates the percent sequence identities for 
the test sequences relative to the reference sequence, based on the program parameters. 

20 A "comparison window", as used herein, includes reference to a segment of any 

one of the number of contiguous positions selected from the group consisting of from 20 
to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a 
sequence may be compared to a reference sequence of the same number of contiguous 
positions after the two sequences are optimally aligned. Methods of alignment of 

25 sequence for comparison are well-known in the art. Optimal alignment of sequences for 
comparison can be conducted, e.g. , by the local homology algorithm of Smith & 
Waterman, Adv. Appl. Math. 2:482, 1981, by the homology alignment algorithm of 
Needleman & Wunsch, J. Mol. Biol 48:443, 1970, by the search for similarity method of 
Person & Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988, by computerized 

30 implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the 
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Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., 
Madison, WI), or by manual alignment and visual inspection. Other algorithms for 
determining homology or identity include, for example, in addition to a BLAST 
program (Basic Local Alignment Search Tool at the National Center for Biological 
5 Information), ALIGN, AMAS (Analysis of Multiply Aligned Sequences), AMPS 
(Protein Multiple Sequence Alignment), ASSET (Aligned Segment Statistical 
Evaluation Tool), BANDS, BESTSCOR, BIOSCAN (Biological Sequence 
Comparative Analysis Node), BLIMPS (BLocks IMProved Searcher), FASTA, 
Intervals & Points, BMB, CLUSTAL V, CLUSTAL W, CONSENSUS, 

10 LCONSENSUS, WCONSENSUS, Smith- Waterman algorithm, DARWIN, Las Vegas 
algorithm, FNAT (Forced Nucleotide Alignment Tool), Framealign, Framesearch, 
DYNAMIC, FILTER, FSAP (Fristensky Sequence Analysis Package), GAP (Global 
Alignment Program), GENAL, GIBBS, GenQuest, ISSC (Sensitive Sequence 
Comparison), LALIGN (Local Sequence Alignment), LCP (Local Content Program), 

15 MACAW (Multiple Alignment Construction & Analysis Workbench), MAP (Multiple 
Alignment Program), MBLKP, MBLKN, PIMA (Pattern-Induced Multi-sequence 
Alignment), SAGA (Sequence Alignment by Genetic Algorithm) and WHAT-IF. 
Such alignment programs can also be used to screen genome databases to identify 
polynucleotide sequences having substantially identical sequences. A number of 

20 genome databases are available, for example, a substantial portion of the human 
genome is available as part of the Human Genome Sequencing Project (J. Roach, 
http://weber.u. Washington.edu/-roach/human_genome_ progress 2.html) (Gibbs, 
1995). At least twenty-one other genomes have already been sequenced, including, for 
example, M genitalium (Fraser et at 9 1995), M. jannaschii (Bult et at, 1996), H. 

25 influenzae (Fleischmann et at , 1 995), E. coli (Blattner et at , 1 997), and yeast (S. 

cerevisiae) (Mewes et at , 1997), and D. melanogaster (Adams et at , 2000). Significant 
progress has also been made in sequencing the genomes of model organism, such as 
mouse, C. elegans, and Arabadopsis sp. Several databases containing genomic 
information annotated with some functional information are maintained by different 

30 organization, and are accessible via the internet, for example, http://wwwtigr.org/tdb; 
http://www.genetics.wisc.edu; http://genome-www.stanford.edu/-ball; http://hiv- 
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web.lanl.gov; http://www.ncbi.nlm.11ih.gov; http://www.ebi.ac.uk; 
http://Pasteur.fr/other/biology; and http:// www.genome.wi.mit.edu. 

One example of a useful algorithm is BLAST and BLAST 2.0 algorithms, which 
are described in Altschul et al, Nuc. Acids Res. 25:3389-3402, 1977, and Altschul et 
5 al, J. Mol. Biol. 215:403-410, 1990, respectively. Software for performing BLAST 
analyses is publicly available through the National Center for Biotechnology 
Information (b.ttp://www.ncbi.nlm.nih. gov). This algorithm involves first identifying 
high scoring sequence pairs (HSPs) by identifying short words of length W in the query 
sequence, which either match or satisfy some positive-valued threshold score T when 

10 aligned with a word of the same length in a database sequence. T is referred to as the 
neighborhood word score threshold (Altschul etal, supra). These initial neighborhood 
word hits act as seeds for initiating searches to find longer HSPs containing them. The 
word hits are extended in both directions along each sequence for as far as the 
cumulative alignment score can be increased. Cumulative scores are calculated using, 

15 for nucleotide sequences, the parameters M (reward score for a pair of matching 

residues; always >0). For amino acid sequences, a scoring matrix is used to calculate 
the cumulative score. Extension of the word hits in each direction are halted when: the 
cumulative alignment score falls off by the quantity X from its maximum achieved 
value; the cumulative score goes to zero or below, due to the accumulation of one or 

20 more negative-scoring residue alignments; or the end of either sequence is reached. The 
BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the 
alignment. The BLASTN program (for nucleotide sequences) uses as defaults a 
wordlength (W) of 1 1, an expectation (E) of 10, M=5, N— 4 and a comparison of both 
strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength 

25 of 3, and expectations (E) of 1 0, and the BLOSUM62 scoring matrix (see Henikoff & 
Henikoff, Proc. Natl. Acad. Sci. USA 89:10915, 1989) alignments (B) of 50, 
expectation (E) of 10, M=5, N= -4, and a comparison of both strands. 

The BLAST algorithm also performs a statistical analysis of the similarity 
between two sequences (see, e.g., Karlin & Altschul, Proc. Natl. Acad. Sci. USA 
30 90 : 5 873 , 1 993). One measure of similarity provided by BLAST algorithm is the 
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smallest sum probability (P(N)), which provides an indication of the probability by 
which a match between two nucleotide or amino acid sequences would occur by chance. 
For example, a nucleic acid is considered similar to a references sequence if the smallest 
sum probability in a comparison of the test nucleic acid to the reference nucleic acid is 
5 less than about 0.2, more preferably less than about 0.01, and most preferably less than 
about 0.001. 

In one embodiment, protein and nucleic acid sequence homologies are 
evaluated using the Basic Local Alignment Search Tool ("BLAST") In particular, five 
specific BLAST programs are used to perform the following task: 
10 (1) BLASTP and BLAST3 compare an amino acid query sequence 

against a protein sequence database; 

(2) BLASTN compares a nucleotide query sequence against a 
nucleotide sequence database; 

(3) BLASTX compares the six-frame conceptual translation 
15 products of a query nucleotide sequence (both strands) against a protein 

sequence database; 

(4) TBLASTN compares a query protein sequence against a 
nucleotide sequence database translated in all six reading frames (both 
strands); and 

20 (5) TBLASTX compares the six- frame translations of a nucleotide 

query sequence against the six-frame translations of a nucleotide sequence 
database. 

The BLAST programs identify homologous sequences by identifying similar 
segments, which are referred to herein as "high-scoring segment pairs," between a 

25 query amino or nucleic acid sequence and a test sequence which is preferably obtained 
from a protein or nucleic acid sequence database. High-scoring segment pairs are 
preferably identified (i.e., aligned) by means of a scoring matrix, many of which are 
known in the art. Preferably, the scoring matrix used is the BLOSUM62 matrix 
(Gonnet et al, Science 256:1443-1445, 1992; Henikoff and Henikoff, Proteins 17:49- 

30 61, 1993). Less preferably, the PAM or PAM250 matrices may also be used (see, e.g., 
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Schwartz and Dayhoff, eds. 5 1978, Matrices for Detecting Distance Relationships: 
Atlas of Protein Sequence and Structure, Washington: National Biomedical Research 
Foundation). BLAST programs are accessible through the U.S. National Library of 
Medicine, e.g., atwww.ncbi.nlm.nih.gov. 

5 The parameters used with the above algorithms may be adapted depending on 

the sequence length and degree of homology studied. In some embodiments, the 
parameters may be the default parameters used by the algorithms in the absence of 
instructions from the user. 

By a "substantially pure polypeptide" is meant a synthase polypeptide (e.g., a 
10 chalcone synthase) which has been separated from components which naturally 

accompany it. Typically, the polypeptide is substantially pure when it is at least 60%, 
by weight, free from the proteins and naturally-occurring organic molecules with 
which it is naturally associated. Preferably, the preparation is at least 75%, more 
preferably at least 90%, and most preferably at least 99%, by weight, synthase 
15 polypeptide. A substantially pure synthase polypeptide may be obtained, for example, 
by extraction from a natural source; by expression of a recombinant nucleic acid 
encoding an synthase polypeptide; or by chemically synthesizing the protein. Purity 
can be measured by any appropriate method (e.g., column chromatography, 
polyacrylamide gel electrophoresis, or by HPLC analysis). 

20 One aspect of the invention resides in obtaining crystals of the synthase 

polypeptide, chalcone synthase, of sufficient quality to determine the three 
dimensional (tertiary) structure of the protein by X-ray diffraction methods. The 
knowledge obtained concerning the three-dimensional structure of chalcone synthase 
can be used in the determination of the three dimensional structure of other synthase 

25 polypeptides in the polyketide synthesis pathway. The structural coordinates of 
chalcone synthase can be used to develop new polyketide synthesis enzymes or 
synthase inhibitors using various computer models. Based on the structural 
coordinates of the chalcone synthase polypeptide (e.g., the three dimensional protein 
structure), as described herein, novel polyketide synthases can be engineered. In 
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addition, small molecules which mimic or are capable of interacting with a functional 
domain of a synthase molecule, can be designed and synthesized to modulate chalcone 
synthase, pyrone synthase, and other polyketide synthase biological functions as well 
as the biological functions of other polyketide synthases. Accordingly, in one 
5 embodiment, the invention provides a method of "rational" enzyme or drug design. 
Another approach to "rational" enzyme or drug design is based on a lead compound 
that is discovered using high throughput screens; the lead compound is further 
modified based on a crystal structure of the binding regions of the molecule in 
question. Accordingly, another aspect of the invention is to provide related protein 
10 sequences or material which is a starting material in the rational design of new 
synthases or drugs which lead to the synthesis of new polyketides or modify the 
polyketide synthesis pathway. 

"Active Site" refers to a site in a synthase defined by amino acid residues that 
interact with substrate and facilitate a biosynthetic reaction that allows one or more 

15 products to be produced. For example, an active site is comprised of a-carbon atoms 
that are indirectly linked via peptide bonds and have the structural coordinates disclosed 
in Table 1 ± 2.3 angstroms. Other active site amino acids for chalcone synthase include 
CI 64, H303, and N336. The position in three-dimensional space of an a-carbon at the 
active site of a synthase and of R-groups associated therewith can be determined using 

20 techniques such as three-dimensional modeling, X-ray crystallography, and/or 

techniques associated therewith. Active sites can be specified by a set of amino acid 
residues. Other residues can play a reole in substrate specificity and enzyme activity 
by modulating size, shapre, charge, and the like of the active site. In addition, second 
tier residues may also modulate the specificity and/or activity of the enzyme. 

25 In CHS, at least five areas of primary sequence containing residues that play a 

role modulating enzyme specificity and/or activity are found. Each area contains a 
total of about four to about fifteen amino acid residues. Within each area, about three 
to six, and preferably four or five amino acid residues that interact with substrate are 
found. Residues may be directly within or lining the active site to modulate 

30 specificity and/or activity. Residues may also be involved in second tier interactions 
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that modulate the specificity and/or activity of the active site, without being physically 
located within the active site. Various mutants of these residues have been prepared 
to evaluate the role of these residues in CHS function and activity, including substrate 
specificity and product formation. Table 3 presents some of the mutations of CHS 
5 that have been made to affect CHS function. 



TABLE 3 - Mutants of CHS 



Mutant 
Name 


Mutant 
Code 


Mutations relative to alfalfa CHS 


A4 


0002 


A4 (L268K, K269G, D270A, G273D) 


14B 
(=6xCHS) 


1200 


Al CV98L ) 

A2 (T131S, S133T, G134T, V135P, M137L) 


2B 


2200 


Al (D96A, V98L, V99A, V100M) 

A2 (T131S, S133T, G134T, V135P, M137L) 


16B 
(=8xCHS) 


1210 


Al (V98L,) 

A2 (T131S, S133T, G134T, V135P, M137L) 
A3 (M158G, Y160F) 


4B 


2211 


Al (D96A, V98L, V99A, V100M) 

A2 (T131S, S133T, G134T, V135P, M137L) 

A3(M158G,Y160F) 

A4 (K269G) 


6B 


1220 


Al (V98L,) 

A2 (T131S, S133T, G134T, V135P, M137L) 
A3 (Y157V, M158G, M159V, Y160F, Q165H); 


18xCHS 


2222 


Al (D96A, V98L, V99A, V100M) 
A2 (T131S, S133T, G134T, V135P, M137L) 
A3(Y157V, M158G, M159V, Y160F, Q165H) 
A4 (L268K, K269G, D270A, G273D) 


22xCHS 


2222 
+ AreaBl 


Al (D96A, V98L, V99A, V100M) 

A2 (T131S, S133T, G134T, V135P, M137L) 

A3 (Y157V, M158G, M159V, Y160F, Q165H) 

Bl (D255G, H257K, L258V,H266Q) 

A4 (L268K, K269G, D270A, G273D) 



A polyketide synthase can be divided into regional areas A1-A4 and Bl. 
Areas Al and A3 flank area A2 ? from below and above, respectively (see Figure 15). 
10 Both areas seem to have importance mainly in regards to compensatory steric changes 
which allow a proline induced kink in area A2 relative to the CHS position. The 
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backbone C-alpha traces of Al and A3 do not actually vary much from CHS to STS, 
but length of indicated residues does. In area Al, amino acids involved in the 
modulation of enzyme specificity and/or activity for chalcone synthase include D96, 
V98, V99 and V100. In area A3, such amino acids include Y157, M158, M159, Y160 
5 and Q165. Mutations at V98 and V99 in area Al, and at M158 and Y160 in area A3 
appear especially important for modifying activity. 

Area A2 appears to be the most important area, and is located at the dimer 
interface, directly between the active site cavities of each monomer. In area A2 amino 
acids involved in the modulation of enzyme specificity and/or activity include T131, 
10 S133, G134, V135 and M137. Mutations at G134 and V135 appear especially 
important for modifying activity. 

Area A4 is located on the outside of the protein, near the active site entrance. 
A4 mutations made to wild type CHS seems to have no effect on cyclization specificity, 
indicating that this area is not important to the conversion of activity seen in the certain 
15 mutants, for example, the 18xCHS mutant. However, this area may be important in the 
improvements to conversion seen with the addition of four more mutants (at Bl, see 
Figure 16 and below) in the 22x CHS mutant. In area A4, amino acids involved in the 
modulation of enzyme specificity and/or activity include L268, K269, D270 and 
G273. 

20 Area Bl flanks A4 and bridges the gap between A1-A3 and A4. In area Bl, 

amino acids involved in the modulation of enzyme specificity and/or activity include 
D255, H257, L258 and H266. These mutations are in an area predicted in by Ferrer, 
et ah and are important for cyclization specificity. 

"Altered substrate specificity" or "altered activity"includes a change in the 
25 ability of a mutant synthase to use a particular substrate and/or produce a polyketide 
product as compared to a non-mutated synthase. Altered substrate specificity may 
include the ability of a synthase to exhibit different enzymatic parameters relative to a 
non-mutated synthase (K m , V ma x. etc), use different substrates, and/or produce 
products that are different from those of known synthases. 
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"Structure coordinates" refers to Cartesian coordinates (x, y, and z positions) 
derived from mathematical equations involving Fourier synthesis as determined from 
patterns obtained via diffraction of a monochromatic beam of X-rays by the atoms 
(scattering centers) of a polyketide synthase molecule in crystal form. Diffraction data 
5 are used to calculate electron density maps of repeating protein units in the crystal 
(unit cell). Electron density maps are used to establish the positions of individual 
atoms within a crystal's unit cell. The term "crystal structure coordinates" refers to 
mathematical coordinates derived from mathematical equations related to the patterns 
obtained on diffraction of a monochromatic beam of X-rays by the atoms (scattering 

10 centers) of a synthase polypeptide (e.g., a chalcone synthase protein molecule) in 

crystal form. The diffraction data are used to calculate an electron density map of the 
repeating unit of the crystal. The electron density maps are used to establish the 
positions of the individual atoms within the unit cell of the crystal. The crystal 
structure coordinates of a synthase can be obtained from crystals and can also be 

15 obtained by means of computational analysis. 

The term "selenomethionine substitution" refers to the method of producing a 
chemically modified form of the crystal of a synthase (e.g., a chalcone synthase). The 
synthase protein is expressed by bacteria in media that is depleted in methionine and 
supplement with selenomethionine. Selenium is thereby incorporated into the crystal 
20 in place of methionine sulfurs. The location(s) of selenium are determined by X-ray 
diffraction analysis of the crystal. This information is used to generate the phase 
information used to construct a three-dimensional structure of the protein. 

"Heavy atom derivatization" refers to a method of producing a chemically 
modified form of a synthase crystal. In practice, a crystal is soaked in a solution 

25 containing heavy atom salts or organometallic compounds, e.g., lead chloride, gold 
thiomalate, thimerosal, uranyl acetate, and the like, which can diffuse through the 
crystal and bind to the protein's surface. Locations of the bound heavy atoms can be 
determined by X-ray diffraction analysis of the soaked crystal. This information is 
then used to construct phase information which can then be used to construct three- 

30 dimensional structures of the enzyme as described in Blundel, T. L., and Johnson, N. 



WO 02/057418 



PCT/US01/48523 



25 

L., Protein Crystallography, Academic Press (1976), which is incorporated by 
reference herein. 

"Unit cell" refers to a basic parallelepiped shaped block. Regular assembly of 
such blocks may construct the entire volume of a crystal. Each unit cell comprises a 
5 complete representation of the unit pattern, the repetition of which builds up the 
crystal. 

"Mutagenesis" refers to the changing of one R-group for another as defined 
herein. This can be most easily perfomied by changing the coding sequence of the 
nucleic acid encoding the amino acid residue. In the context of the present invention, 
10 mutagenesis does not change the carbon coordinates beyond the limits defined herein. 

"Space Group" refers to the arrangement of symmetry elements within a 

crystal. 

"Molecular replacement" refers to generating a preliminary model of a 
polyketide synthase whose structural coordinates are unknown, by orienting and 
positioning a molecule whose structural coordinates are known within the unit cell of 
the unknown crystal so as best to account for the observed diffraction pattern of the 
unknown crystal. Phases can then be calculated from this model and combined with 
the observed amplitudes to give an approximate Fourier synthesis of the structure 
whose coordinates are unknown. This in turn can be subject to any of the several 
forms of refinement to provide a final, accurate structure of the unknown crystal 
(Lattman, E., 1985, in Methods in Enzymology, 11 5.55-77; Rossmann, MG., ed., 
"The Molecular Replacement Method" 1972, hit, Sci. Rev. Ser., No. 13, Gordon & 
Breach, New York). Using structure coordinates of the polyketide synthase provided 
herein (see e.g., PDB Accession Numbers) molecular replacement maybe used to 
determine the structural coordinates of a crystalline mutant, homologue, or a different 
crystal form of polyketide synthase. 



20 
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A "synthase" or a "polyketide synthase" includes any one of a family of 
enzymes that catalyze the formation of polyketide compounds. Polyketide synthases 
are generally homodimers, with each monomer being enzymatically acitve. 

"Substrate" refers to the Coenzyme-A (CoA) thioesters that are acted on by the 
5 polyketide synthases and mutants thereof disclosed herein, such as malonyl-CoA, 
coumaroyl-CoA, hexamoyl-CoA, ACP or NAC thioesters and the like. 

The present invention relates to crystallized polyketide synthases and mutants 
thereof from which the position of specific a-carbon atoms and R-groups associated 
therewith comprising the active site can be determined in three-dimensional space. 

10 The invention also relates to structural coordinates of said polyketide synthases, use of 
said structural coordinates to develop structural information related to polyketide 
synthase homologues, mutants, and the like, and to crystal forms of such synthases. 
Furthermore, the invention, as disclosed herein, provides a method whereby said a- 
carbon structural coordinates specifically determined for atoms comprising the active 

15 site of said synthase, as shown in Table 1 and including C164, H303, andN336, can 
be used to develop synthases wherein R-groups associated with active site a-carbon 
atoms are different from the R-groups found in native CHS, e.g., are mutant 
synthases. In addition, the present invention provides for production of mutant 
polyketide synthases based on the structural information of synthases (and provided 

20 herein) and for use of said mutant synthases to make a variety of polyketide 
compounds using a variety of substrates (as described in PCT Application 
US00/20674, filed July 27, 2000, incorporated by reference in its entirety herein). The 
present invention also provides methods of producing novel mutant polyketide 
synthases by comparing the crystal structures of two different polyketide synthases. 

25 The present invention further provides, for the first time, crystals of several 

polyketide synthases, as exemplified by chalcone synthase (CHS; PDB Accession No. 
1B15), stilbene synthase (STS; Pinus sylvestris, pine - Appendix A; and Arachis 
hypogaea, peanut - Appendix B), and pyrone synthase (2-PS; PDB Accession No. 
1EE0). Also provided are coordinates for crystals which are grown in the presence 
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and absence of substrate, substrate analogues, and products, thus allowing definition 
of the structural or atomic coordinates associated therewith. Said structural 
coordinates allow determination of the carbon atoms comprising the active site, Re- 
groups associated therewith, and the interaction of said a-carbons and said R-groups 
5 with each other. For example, Table 4 identifies various substrates and products that 
were grown with chalcone synthase as well as their PDB accession numbers, all of 
which are incorporated by reference herein in their entirety. 



TABLE 4 



Complex 


PDB Accession No. 


CHS-coA complex 


1BQ6 


CHS-malonyl-CoA complex 


1CML 


CHS-hexanoyl-CoA comlex 


1CHW 


CHS-naringenin complex 


1CGK 


CHS-resveratrol complex 


1CGZ 



15 The crystals of the present invention belong to the tetragonal space group. The 

unit cell dimensions vary by a few angstroms between crystals but on average belong 
to the space groups with unit cell dimensions as in Table 5. 

TABLE 5 - Crystals of Polyketide Synthases 



Crystal 


Space 
Group 


Unit Cell Dimensions 


a (A) 


b(A) 


c(A) 


oc(°) 


P(°) 


Y(°) 


CHS 
(alfalfa) 


P32 2 1 


97.54 


97.54 


65.52 


90.00 


90.00 


120.00 


STS 
(pine) 


P2(l) 


57.221 


361.291 


57.317 


90.00 


98.39 


90.00 


STS 
(peanut) 


P2(l) 


74.348 


101.747 


113.609 


90.00 


108.84 


90.00 


2-PS 


P31 2 1 


83.41 


83.41 


240.62 


90.00 


90.00 


120.00 


18xCHS 


P2(l) 


71.638 


59.753 


82.539 


90.00 


108.166 


90.00 
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Crystal structures are preferably obtained at a resolution of about 1 .56 
angstroms to about 3 angstroms for a polyketide synthase in the presence and in the 
absence of bound substrate or substrate analog. Coordinates for a polyketide synthase 
in the absence of a substrate bound in the active site have been deposited at the 
5 Protein Data Bank, accession number 1BI5. Those skilled in the art understand that a 
set of structure coordinates determined by X-ray crystallography is not without 
standard error. Therefore, for the purpose of this invention, any set of structure 
coordinates wherein the active site a-carbons of a polyketide synthase, synthase 
homologue, or mutants thereof, have a root mean square deviation less than ±2.3 
10 angstroms when superimposed using the structural coordinates listed in Table 1 and 
PDB Accession No. 1BI5, shall be considered identical. 

A schematic representation of the three-dimensional shape of a CHS 
homodimer is shown in Figure 2a, which was prepared by MOLSCRIPT (Kraulis, J. 
Appl. Crystallogr. 24:946-950, 1991). CHS functions as a homodimer of two 42 kDa 

15 polypeptides. The structure of CHS reveals that the enzyme forms a symmetric dimer 
with each monomer related by a 2-fold crystallographic axis. The dimer interface 
buries approximately 1580 angstroms with interactions occurring along a fairly flat 
surface. Two distinct structural features delineate the ends of this interface. First, the 
N-terminal helix of monomer A entwines with the corresponding helix of monomer B. 

20 Second, a tight loop containing a cis-peptide bond between Meti 37 and Proiss exposes 
the methionine sidechain as a knob on the monomer surface. Across the interface, 
Met 137 protrudes into a hole found in the surface of the adjoining monomer to form 
part of the cyclization pocket (discussed below). 

The CHS homodimer contains two functionally independent active sites 
25 (Tropf, et al, J. Biol. Chem. 270:7922-7928, 1995). Consistent with this information, 
bound CoA thioesters and product analogs occupy both active sites of the homodimer 
in the CHS complex structures. These structures identify the location of the active 
site at the cleft between the upper and lower domains of each monomer. Each active 
site consists almost entirely of residues from a single monomer, with Meti 37 from the 



WO 02/057418 



PCT/US01/48523 



29 

adjoining monomer being the only exception. A detailed description of the active site 
structure is presented in the Examples section, below. 

An isolated, polyketide synthase of the invention comprises at least fourteen 
active site a-carbons having the structural coordinates of Table 1 ±2.3 angstroms. The 
5 active site a-carbons of Table 1 generally are not all contiguous, i.e., are not adjacent 
to one another in the primary amino acid sequence of a polyketide synthase due to 
intervening amino acid residues between various active site a-carbons. Nevertheless, 
it should be appreciated that certain active site a-carbons can be adjacent to one 
another in some instances. Active site a-carbons are numbered in Table 1 for 
10 convenience only and may be situated in any suitable order in the primary amino acid 
sequence that achieves the structural coordinates given in Table 1. 

An appropriate combination of R-groups, linked to active site a-carbons, can 
facilitate the formation of one or more desired reaction products. The combination of 
R-groups selected for use in a synthase can be any combination other than the ordered 
15 arrangements of R-groups found in known native isolated polyketide synthases. 
Typically, R-groups found on active site a-carbons are those found in naturally 
occurring amino acids. In some embodiments, however, R-groups other than those 
found in naturally occurring amino acids can be used. 

The present invention permits the use of molecular design techniques to 
20 design, select, and synthesize mutant polyketide synthases that produce different 

and/or novel polyketide compounds using the same substrates. Mutant proteins of the 
present invention and nucleic acids encoding the same can be designed by genetic 
manipulation based on structural information about polyketide synthases. For 
example, one or more R-groups associated with the active site a-carbon atoms of CHS 
25 can be changed by altering the nucleotide sequence of the corresponding CHS gene, 
thus making one or more mutant polyketide synthases. Such genetic manipulations 
can be guided by structural information concerning the R-groups found in the active 
site a-carbons when substrate is bound to the protein upon crystallization. 
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Alternatively, mutant polyketide syntases can be prepared by standard protocols for 
polypeptide synthesis as is well known in the art. 

Mutant proteins of the present invention may be prepared in a number of ways 
available to the skilled artisan. For example, the gene encoding wild-type CHS may 
5 be mutated at those sites identified herein as corresponding to amino acid residues 
identified in the active site by means currently available to the artisan skilled in 
molecular biology techniques. Said techniques include oligonucleotide-directed 
mutagenesis, deletion, chemical mutagenesis, and the like. The protein encoded by 
the mutant gene is then produced by expressing the gene in, for example, a bacterial or 
10 plant expression system. 

Alternatively, polyketide synthase mutants may be generated by site specific- 
replacement of a particular amino acid with an unnaturally occurring amino acid. As 
such, polyketide synthase mutants may be generated through replacement of an amino 
acid residue or a particular cysteine or methionine residue with selenocysteine or 

15 selenomethionine. This may be achieved by growing a host organism capable of 

expressing either the wild-type or mutant polypeptide on a growth medium depleted of 
natural cysteine or methionine or both and growing on medium enriched with either 
selenocysteine, selenomethionine, or both. These and similar techniques are described 
in Sambrook et al. 9 (Molecular Cloning, A Laboratory Manual, 2 nd Ed. (1989) Cold 

20 Spring Harbor Laboratory Press). 

Another suitable method of creating mutant synthases of the present invention 
is based on a procedure described in Noel and Tsal (1989) J. Cell Biochem., 40:309- 
320. In so doing, the nucleic acids encoding said polyketide synthase can be 
synthetically produced using oligonucleotides having overlapping regions, said 
25 oligonucleotides being degenerate at specific bases so that mutations are induced. 
Alternatively, traditional method of protein or polypeptide synthesis maybe used. 

According to the present invention, nucleic acid sequences encoding a mutated 
polyketide synthase can be produced by the methods described herein, or any 
alternative methods available to the skilled artisan. In designing the nucleic acid 
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sequence of interest, it may be desirable to reengineer said gene for improved 
expression in a particular expression system. For example, it has been shown that 
many bacterially derived genes do not express well in plant systems. In some cases, 
plant-derived genes do not express well in bacteria. This phenomenon may be due to 
5 the non-optimal G+C content and/or A+T content of said gene relative to the 
expression system being used. For example, the very low GH-C content of many 
bacterial genes results in the generation of sequences mimicking or duplicating plant 
gene control sequences that are highly A+T rich. The presence of A+T rich sequences 
within the genes introduced into plants (e.g., TATA box regions normally found in 

10 promoters) may result in aberrant transcription of the gene(s). la addition, the 
presence of other regulatory sequences residing in the transcribed mRNA (e.g. 
polyadenylation signal sequences (AAUAAA) or sequences complementary to small 
nuclear RNAs involved in pre-mRNA splicing) may lead to RNA instability. 
Therefore, one goal in the design of genes is to generate nucleic acid sequences that 

15 have a G+C content that affords mRNA stability and translation accuracy for a 
particular expression system. 

Due to the plasticity afforded by the redundancy of the genetic code (i.e., many 
amino acids are specified by more than one codon), evolution of the genomes of 
different organisms or classes of organisms has resulted in differential usage of 

20 redundant codons. This "codon bias" is reflected in the mean base composition of 
protein coding regions. For example, organisms with relatively low G+C contents 
utilize codons having A or T in the third position of redundant codons, whereas those 
having higher G+C contents utilize codons having G or C in the third position. 
Therefore, in reengineering genes for expression, one may wish to determine the 

25 codon bias of the organism in which the gene is to be expressed. Looking at the usage 
of the codons as determined for genes of a particular organism deposited in GenBank 
can provide this information. After determining the bias thereof, the new gene 
sequence can be analyzed for restriction enzyme sites as well as other sites that could 
affect transcription such as exon:intron junctions, polyA addition signals, or RNA 

30 polymerase termination signals. 
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Genes encoding polyketide synthases can be placed in an appropriate vector, 
depending on the artisan's interest, and can be expressed using a suitable expression 
system. An expression vector, as is well known in the art, typically includes elements 
that permit replication of said vector within the host cell and may contain one or more 
5 phenotypic markers for selection of cells containing said gene. The expression vector 
will typically contain sequences that control expression such as promoter sequences, 
ribosome binding sites, and translational initiation and termination sequences. 
Expression vectors may also contain elements such as subgenomic promoters, a 
repressor gene or various activator genes. The artisan may also choose to include 
10 nucleic acid sequences that result in secretion of the gene product, movement of said 
product to a particular organelle such as a plant plastid (see U.S. Patent Nos. 
4,762,785; 5,451,513 and 5,545,817, which are incorporated by reference herein) or 
other sequences that increase the ease of peptide purification, such as an affinity tag. 

A wide variety of expression control sequences are useful in expressing the 
15 mutated polyketide synthases when operably linked thereto. Such expression control 
sequences include, for example, the early and late promoters of S V40 for animal cells, 
the lac system, the tip system, major operator and promoter systems of phage S, and 
the control regions of coat proteins, particularly those from RNA viruses in plants. In 
E. coli, a useful transcriptional control sequence is the T7 RNA polymerase binding 
20 promoter, which can be incorporated into a pET vector as described by Studier et aL, 
(1990) Methods Enzymology, 185:60-89, which is incorporated by reference herein. 

For expression, a desired gene should be operably linked to the expression 
control sequence and maintain the appropriate reading frame to permit production of 
the desired polyketide synthase. Any of a wide variety of well-known expression 

25 vectors are of use to the present invention. These include, for example, vectors 
comprising segments of chromosomal, non-chromosomal and synthetic DNA 
sequences such as those derived from SV40, bacterial plasmids including those from 
E. coli such as col El, pCRl, pBR322 and derivatives thereof, pMB9), wider host 
range plasmids such as RP4, phage DNA such as phage S, NM989, Ml 3, and other 

30 such systems as described by Sambrook et al 9 (Molecular Cloning, A Laboratory 
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Manual, 2 Ed. (1989) Cold Spring Harbor Laboratory Press), which is incorporated 
by reference herein. 

A wide variety of host cells are available for expressing synthase mutants of 
the present invention. Such host cells include, for example, bacteria such as E. coli, 
5 Bacillus and Streptomyces, fungi, yeast, animal cells, plant cells, insect cells, and the 
like. Preferred embodiments of the present invention include chalcone synthase 
mutants that are expressed in E. coli or in plant cells. Said plant cells can either be in 
suspension culture or a transgenic plant as further described herein. 

As stated previously, genes encoding synthases of the present invention can be 

10 expressed in transgenic plant cells. In order to produce transgenic plants, vectors 
containing the nucleic acid construct encoding polyketide synthases and mutants 
thereof are inserted into the plant genome. Preferably, these recombinant vectors are 
capable of stable integration into the plant genome. One variable in making a 
transgenic plant is the choice of a selectable marker. A selectable marker is used to 

15 identify transformed cells against a high background of untransformed cells. The 
preference for a particular marker is at the discretion of the artisan, but any of the 
selectable markers may be used along with any other gene not listed herein that could 
function as a selectable marker. Such selectable markers include aminoglycoside 
phosphotransferase gene of transposon Tn5 (Aph 1 1) (which encodes resistance to the 

20 antibiotics kanamycin), genes encoding resistance to neomycin or G418, as well as 
those genes which code for resistance or tolerance to glyphosate, hygromycin, 
methotrexate, phosphinothricin, imidazolinones, sulfonylureas, triazolophyrimidine 
herbicides, such as chlorosulfuron, bromoxynil, dalapon, and the like. In addition to a 
selectable marker, it may be desirable to use a reporter gene. In some instances a 

25 reporter gene may be used with a selectable marker. Reporter genes allow the 

detection of transformed cells and may be used at the discretion of the artisan. A list 
of these reporter genes is provided in K. Wolsing et al., 1988, Ann. Rev. Genetics, 
22:421. 
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Said genes are expressed either by promoters expressing in all tissues at all 
times (constitutive promoters), by promoters expressing in specific tissues (tissue- 
specific promoters), promoters expressing at specific stages of development 
(developmental promoters), and/or promoter expression in response to a stimulus or 
5 stimuli (inducible promoters). The choice of these is at the discretion of the artisan. 

Several techniques exist for introducing foreign genes into plant cells, and for 
obtaining plants that stably maintain and express the introduced gene. Such 
techniques include acceleration of genetic material coated on a substrate directly into 
cells (U.S. Patents 4,945,050 to Cornell): Plant cells may also be transformed using 

10 Agrobacterium technology (see, for example, U.S. Patents 5,177,010 to University of 
Toledo, 5,104,310 to Texas A&M, U. S. Patents 5,149,645, 5,469,976, 5,464,763, 
4,940,838, and 4,693,976 to Schilperoot, European Patent Applications 116718, 
290799, 320500 to Max Planck, European Patent Applications 604662,627752 and 
U.S. Patent 5,591,616 to Japan Tobacco, European Patent Applications 0267159, 

15 0292435 and U.S. Patent 5,231,019 to Ciba-Geigy, U.S. Patents 5,463,174 and 

4,762,785 to Calgene, and U.S. Patents 5,004,863 and 5,159,135 to Agracetus). Other 
transformation technologies include whiskers technology (see U. S. Patents 5,302,523 
and 5,464,765 to Zeneca). Electroporation technology has also been used to transform 
plants (see WO 87106614 to Boyce Thompson Institute, 5,472,869 and 5,384,253 to 

20 Dakalb, and WO 92/09696 and WO 93/21335 to Plant Genetic Systems, all which are 
incorporated by reference). Viral vector expression systems can also be used such as 
those described inU.S, Patent 5,316,931, 5,589,367, 5,811,653, and 5,866,785 to 
BioSource, which are incorporated by reference herein. 

In addition to numerous technologies for transforming plants, the type of tissue 
25 that is contacted with the genes of interest may vary as well. Suitable tissue includes, 
for example, embryonic tissue, callus tissue, hypocotyl, meristem, and the like. 
Almost all plant tissues may be transformed during de-differentiation using the 
appropriate techniques described herein. 
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In addition, it may be desirable to change the polyketide production of a 
polyketide synthase within a plant. For example, it may be beneficial to increase the 
production of resveratrol in a plant. Resveratrol, the natural product made by the 
CHS-related stilbene synthase (STS) enzymes, is an antifungal compound produced in 
5 a few families of plants, including pine trees, grapevines, and peanuts. When stilbene 
synthase is introduced into plants like tobacco or alfalfa, which normally lack this 
enzyme, the transgenic plant becomes resistant to fungal infection (Mol. Plant 
Microbe Interact. 13(5):551-62. 2000: and Nature 36U6408V 153-6, 1993). Since 
STS uses the exact same substrates as CHS, which is ubiquitous in higher plants, 
10 expression of the STS gene in any of these species should be sufficient to achieve the 
in vivo biosynthesis of resveratrol. 

Furthermore, resveratrol has also been shown to have a number of beneficial 
medicinal activities, including copper chelation, anti-oxidant scavenging of free 
radicals, inhibition of both platelet aggregation and lipid peroxidation, anti- 
15 inflammation, vasodilation, anti-cancer (Life Sci. 66(8) :663 -73, 2000), and the like. 
These effects of resveratrol contribute to the health benefits of the moderate 
consumption of red wine, known as "the French paradox". Red wine has a higher 
resveratrol content than grape juice or white wine, due to the inclusion of the 
resveratrol-rich grape skins during the fermentation process. 

20 Thus, production of resveratrol in plants which lack it is biologically useful for 

the plant, and medicinally useful for humans who consume the plant. While 
transgenic introduction of the stilbene synthase gene has proven effective, enzymes 
are often best-adapted for expression and stability within their own species. The 
ability to engineer full or partial STS activity into a native CHS of a given species 

25 confers the benefits of resveratrol production to that species, while avoiding all of the 
negative effects of foreign transgene expression. 

The mutants of the present invention show that it is possible to mutate a native 
CHS to a STS-like activity (see Figure 14). Futhermore, it is possible to produce the 
STS product resveratrol to varying degrees with different mutants. Thus, a plant can 
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be manipulated to produce varying levels of resveratrol, without eliminating the 
production of the chalcone product required for viability. 

Regardless of the transformation system used, a gene encoding a mutant 
polyketide synthase is preferably incorporated into a gene transfer vector adapted to 
5 express said gene in a plant cell by including in the vector an expression control 

sequence (plant promoter regulatory element). In addition to plant promoter regulatory 
elements, promoter regulatory elements from a variety of sources can be used 
efficiently in plant cells to express foreign genes. For example, promoter regulatory 
elements of bacterial origin, such as the octopine synthase promoter, the nopaline 

10 synthase promoter, the mannopine synthase promoter, and the like, may be used. 

Promoters of viral origin, such as the cauliflower mosaic virus (35S and 198) are also 
desirable. Plant promoter regulatory elements also include ribulose-l,6-bisphosphate 
carboxylase small subunit promoter, beta-conglycinin promoter, phaseolin promoter, 
ADH promoter, heat-shock promoters, tissue specific promoters, and the like. 

15 Numerous promoters are available to skilled artisans for use at their discretion. 

It should be understood that not all expression vectors and expression systems 
function in the same way to express the mutated gene sequences of the present 
invention. Neither do all host cells function equally well with the same expression 
system. However, one skilled in the art may make a selection among these vectors, 
20 expression control sequences, and host without undue experimentation and without 
departing from the scope of this invention. 

Once a synthase of the present invention is expressed, the protein obtained 
therefrom can be purified so that structural analysis, modeling, and/or biochemical 
analysis can be performed, as exemplified herein. The nature of the protein obtained 
25 can be dependent on the expression system used. For example, genes, when 

expressed in mammalian or other eukaryotic cells, may contain latent signal sequences 
that may result in glycosylation, phosphorylation, or other post-translational 
modifications, which may or may not alter function. Therefore, a preferred 
embodiment of the present invention is the expression of mutant synthase genes in 
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E. coli cells. Once said proteins are expressed, they can be easily purified using 
techniques common to the person having ordinary skill in the art of protein 
biochemistry, such as, for example, techniques described in Colligan et al, (1997) 
Current Protocols in Protein Science, Chanda, V. B., Ed., John Wiley & Sons, Inc., 
5 which is incorporated by reference herein. Such techniques often include the use of 
cation-exchange or anion- exchange chromatography, gel filtration-size exclusion 
chromatography, and the like. Another technique that may be commonly used is 
affinity chromatography. Affinity chromatography can include the use of antibodies, 
substrate analogs, or histidine residues (His-tag technology). 

10 Once purified, mutants of the present invention may be characterized by any of 

several different properties. For example, such mutants may have altered active site 
surface charges of one or more charge units. In addition, said mutants may have 
altered substrate specificity or product capability relative to a non-mutated polyketide 
synthase. 

15 The present invention allows for the characterization of polyketide synthase 

mutants by crystallization followed by X-ray diffraction. Polypeptide crystallization 
occurs in solutions where the polypeptide concentration exceeds it solubility 
maximum {i.e., the polypeptide solution is supersaturated). Such solutions may be 
restored to equilibrium by reducing the polypeptide concentration, preferably through 

20 precipitation of the polypeptide crystals. Often polypeptides may be induced to 

crystallize from supersaturated solutions by adding agents that alter the polypeptide 
surface charges or perturb the interaction between the polypeptide and bulk water to 
promote associations that lead to crystallization. 

Compounds known as "precipitants" are often used to decrease the solubility 
25 of the polypeptide in a concentrated solution by forming an energetically unfavorable 
precipitating layer around the polypeptide molecules (Weber, Advances in Protein 
Chemistry, 41:1-36, 1991). In addition to precipitants, other materials are sometimes 
added to the polypeptide crystallization solution. These include buffers to adjust the 
pH of the solution and salts to reduce the solubility of the polypeptide. Various 
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precipitants are known in the art and include the following: ethanol, 3-ethyl-2-4 
pentanediol, and many of the polyglycols, such as polyethylene glycol. 

Commonly used polypeptide crystallization methods include the following 
techniques: batch, hanging drop, seed initiation, and dialysis. In each of these 
5 methods, it is important to promote continued crystallization after nucleation by 

maintaining a supersaturated solution. In the batch method, polypeptide is mixed with 
precipitants to achieve supersaturation, the vessel is sealed, and set aside until crystals 
appear. In the dialysis method, polypeptide is retained in a sealed dialysis membrane 
that is placed into a solution containing precipitant. Equilibration across the 
10 membrane increases the polypeptide and precipitant concentrations thereby causing 
the polypeptide to reach supersaturation levels. 

In the preferred hanging drop technique (McPherson, J. Biol Chem, 6300- 
6306, 1976), an initial polypeptide mixture is created by adding a precipitant to a 
concentrated polypeptide solution. The concentrations of the polypeptide and 

15 precipitants are such that in this initial form, the polypeptide does not crystallize. A 
small drop of this mixture is placed on a glass slide that is inverted and suspended 
over a reservoir of a second solution. The system is then sealed. Typically, the 
second solution contains a higher concentration of precipitant or other dehydrating 
agent. The difference in the precipitant concentrations causes the protein solution to 

20 have a higher vapor pressure than the solution. Since the system containing the two 
solutions is sealed, an equilibrium is established, and water from the polypeptide 
mixture transfers to the second solution. This equilibrium increases the polypeptide 
and precipitant concentration in the polypeptide solution. At the critical concentration 
of polypeptide and precipitant, a crystal of the polypeptide will form. 

25 Another method of crystallization introduces a nucleation site into a 

concentrated polypeptide solution. Generally, a concentrated polypeptide solution is 
prepared and a seed crystal of the polypeptide is introduced into this solution. If the 
concentration of the polypeptide and any precipitants are correct, the seed crystal will 
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provide a nucleation site around which a larger crystal forms. In preferred 
embodiments, the crystals of the present invention are formed in hanging drops. 

Some proteins may be recalcitrant to crystallization. However, several 
techniques are available to the skilled artisan. Quite often the removal of polypeptide 
5 segments at the amino or caroxy terminal end of the protein is necessary to produce 
crystalline protein samples. Said procedures involve either the treatment of the 
protein with one of several proteases including trypsin, chymotrypsin, substilisin, and 
the like. This treatment often results in the removal of flexible polypeptide segments 
that are likely to negatively affect crystallization. Alternatively, the removal of coding 
10 sequences from the protein's gene facilitates the recombinant expression of shortened 
proteins that can be screened for crystallization. 

The crystals so produced have a wide range of uses. For example, high quality 
crystals are suitable for X-ray or neutron diffraction analysis to determine the three- 
dimensional structure of a mutant polyketide synthase and to design additional 

15 mutants thereof. In addition, crystallization can serve as a further purification method. 
In some instances, a polypeptide or protein will crystallize from a heterogeneous 
mixture into crystals. Isolation of such crystals by filtration, centrifugation, etc., 
followed by redissolving the polypeptide affords a purified solution suitable for use in 
growing the high-quality crystals needed for diffraction studies. The high-quality 

20 crystals may also be dissolved in water and then formulated to provide an aqueous 
solution having other uses as desired. 

Because synthases may crystallize in more than one crystal form, the structural 
coordinates of a-carbons of an active site determined from a synthase or portions 
thereof, as provided by this invention, are particularly useful to solve the structure of 
25 other crystal forms of synthases. Said structural coordinates, as provided herein, may 
also be used to solve the structure of synthases having a-carbons positioned within the 
active sites in a manner similar to the wild-type, yet having R-groups that may or may 
not be identical. 
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Furthermore, the structural coordinates disclosed herein may be used to 
determine the structure of the crystalline form of other proteins with significant amino 
acid or structural homology to any functional domain of a synthase. One method that 
may be employed for such purpose is molecular replacement. In this method, the 
5 unknown crystal structure, whether it is another crystal form of a synthase, a synthase 
having a mutated active site, or the crystal of some other protein with significant 
sequence and/or structural homology to a polyketide synthase may be determined 
using the coordinates given in Table 1. This method provides sufficient structural 
form for the unknown crystal more efficiently than attempting to determine such 
10 information ab initio. In addition, this method can be used to determine whether or 
not a given polyketide synthase in question falls within the scope of this invention. 

As further disclosed herein, polyketide synthases and mutants thereof may be 
crystallized in the presence or absence of substrates and substrate analogs. The crystal 
structures of a series of complexes may then be solved by molecular replacement and 
15 compared to that of the wild-type to assist in determination of suitable replacements 
for R-groups within the active site, thus making synthase mutants according to the 
present invention. 

All mutants of the present inventions maybe modeled using the information 
disclosed herein without necessarily having to crystallize and solve the structure for 

20 each and every mutant. For example, one skilled in the art may use one of several 
specialized computer programs to assist in the process of designing synthases having 
mutated active sites relative to the wild-type. Examples of such programs include: 
GRID (Goodford, 1985, J. Mod. Chem.:2S: 849-857), MCSS (Miranker and Karplus, 
1991, Proteins: Structure, Function and Genetics, 1 1 :29-34); AUTODOCK (Goodsell 

25 and Olsen, 1990, Proteins. Structure, Fumtion, and Genetics, 8:195-202); and DOCK 
(Kuntz et al, 1982, J. Mol #zo/: 161:269-288), and the like, as well as those discussed 
in the Examples below. In addition, specific computer programs are also available to 
evaluate specific substrate-active site interactions and the deformation energies and 
electrostatic interactions resulting therefrom. MODELLER is a computer program 

30 often used for homology or comparative modeling of the three-dimensional structure 
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ofaprotein. A. Saii & T.L. Blundell. J. MoLBioL 234:779-815, 1993. Asequenceto 
be modeled is aligned with one or more known related structures and the 
MODELLER program is used to calculate a full-atom model, based on optimum 
satisfaction of spatial restraints. Such restraints can include, inter alia, homologous 
5 structures, site-directed mutagenesis, fluorescence spectroscopy, NMR experiments, 
or atom-atom potentials of mean force. 

The present invention enables polyketide synthase mutants to be made and the 
crystal structure thereof to be solved. Moreover, by virtue of the present invention, 
the location of the active site and the interface of substrate therewith permit the 
10 identification of desirable R-groups for mutagenesis. 

The three-dimensional coordinates of the polyketide synthase provided herein 
may additionally be used to predict the activity and or substrate specificity of a protein 
whose primary amino acid sequence suggests that it may have polyketide synthase 
activity. The family of CHS-related enzymes is defined, in part, by the presence of 

15 four highly conserved amino acid residues, Cysi64, Phe2i5 5 His303, and Asn 33 6. More 
than 400 enzymes having these conserved residues have been identified to date, 
including several bacterial proteins. The functions, substrates, and products of many 
of these enzymes remains unknown. However, by employing the three-dimensional 
coordinates disclosed herein and computer modeling programs, structural comparisons 

20 of CHS can be made with a putative enzyme. Similarities and differences between the 
two would provide the skilled artisan with information regarding the activity and/or 
substrate specificity of the putative enzyme. This procedure is demonstrated in the 
Examples section below. 

Thus, in another embodiment of the invention, there is provided a method of 
25 predicting the activity and/or substrate specificity of a putative polyketide synthase 
comprising (a) generating a three-dimentional representation of a known polyketide 
synthase using three-dimentional coordinate data, (b) generating a predicted three- 
dimentional representation of a putative polyketide synthase, and (c) comparing the 
representation of the known polyketide synthase with the representation of the 
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putative polyketide synthase, wherein the similarities and/or differences between the 
two representations are predictive of activity and/or substrate specificity of the 
putative polyketide synthase. 

In a further embodiment of the present invention, there is also provided a 
5 method of identifying a potential substrate of a polyketide synthase comprising 

(a) defining the active site of the polyketide synthase based on the atomic coordinates 
of said polyketide synthase, (b) identifying a potential substrate that fits the defined 
active site, and (c) contacting the polyketide synthase with the potential substrate of 

(b) and determining the activity thereon. Techniques for computer modeling and 
10 structural comparisons similar to those described herein for predicting putative 

polyketide synthase activity and/or substrate specificity can be used to identify novel 
substrates for polyketide synthases. 

In addition, the structural coordinates and three-dimensional models disclosed 
herein can be used to design or identify polyketide synthase inhibitors. Using the 
15 modeling techniques disclosed herein, potential inhibitor structures can be modeled 
with the polyketide synthase active site and those that appear to interact therewith can 
subsequently be tested in activity assays in the presence of substrate. 

Methods of using crystal structure data to design binding agents or substrates 
are known in the art. Thus, the crystal structure data provided herein can be used in 

20 the design of new or improved inhibitors, substrates or binding agents. For example, 
the synthase polypeptide coordinates can be superimposed onto other available 
coordinates of similar enzymes to identify modifications in the active sites of the 
enzymes to create novel products of enzymatic activity or to modulate polyketide 
synthesis. Alternatively, the synthase polypeptide coordinates can be superimposed 

25 onto other available coordinates of similar enzymes which have substrates or 
inhibitors bound to them to give an approximation of the way these and related 
substrates or inhibitors might bind to a synthase. Alternatively, computer programs 
employed in the practice of rational drug design can be used to identify compounds 
that reproduce interaction characteristics similar to those found between a synthase 
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polypeptide and a co-crystalized substrate. Furthermore, detailed knowledge of the 
nature of binding site interactions allows for the modification of compounds to alter or 
improve solubility, pharmacokinetics, etc, without affecting binding activity. 

Computer programs are widely available that are capable of carrying out the 
5 activities necessary to design agents using the crystal structure information provided 
herein. Examples include, but are not limited to, the computer programs listed below: 
Catalyst Databases™ - an information retrieval program accessing 
chemical databases such as BioByte Master File, Derwent WDI and 
ACD; 

10 Catalyst/HYPO™ - generates models of compounds and hypotheses to 

explain variations of activity with the structure of drug candidates; 
Ludi™ - fits molecules into the active site of a protein by identifying 
and matching complementary polar and hydrophobic groups; 
Leapfrog™ - "grows" new ligands using a genetic algorithm with 

15 parameters under the control of the user. 

In addition, various general purpose machines may be used with programs 
written in accordance with the teachings herein, or it may be more convenient to 
construct more specialized apparatus to perform the operations. However, preferably 
the embodiment is implemented in one or more computer programs executing on 
20 programmable systems each comprising at least one processor, at least one data storage 
system (including volatile and non-volatile memory and/or storage elements), at least 
one input device, and at least one output device. The program is executed on the 
processor to perform the functions described herein. 

Each such program may be implemented in any desired computer language 
25 (including machine, assembly, high level procedural, object oriented programming 
languages, or the like) to communicate with a computer system. In any case, the 
language may be a compiled or interpreted language. The computer program will 
typically be stored on a storage media or device (e.g., ROM, CD-ROM, or magnetic or 
optical media) readable by a general or special purpose programmable computer, for 
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configuring and operating the computer when the storage media or device is read by the 
computer to perform the procedures described herein. The system may also be 
considered to be implemented as a computer-readable storage medium, configured with 
a computer program, where the storage medium so configured causes a computer to 
5 operate in a specific and predefined manner to perform the functions described herein. 

Embodiments of the invention include systems (e.g., internet based systems), 
particularly computer systems which store and manipulate the coordinate and sequence 
information described herein. One example of a computer system 100 is illustrated in 
block diagram fomi in Figure 9. As used herein, "a computer system" refers to the 

10 hardware components, software components, and data storage components used to 
analyze the coordinates and sequences as set forth in one or more of Accession Nos. 
1BI5, 1D6F, 1D6I, 1D6H, 1BQ6, 1CML, 1CHW, 1CGK, 1CGZ, Table 1, and Appendix 
A. The computer system 100 typically includes a processor for processing, accessing 
and manipulating the sequence data. The processor 105 can be any well-known type of 

15 central processing unit, such as, for example, the Pentium HI from Intel Corporation, or 
similar processor from Sun, Motorola, Compaq, AMD or International Business 
Machines. 

Typically the computer system 100 is a general purpose system that comprises 
the processor 105 and one or more internal data storage components 110 for storing 
20 data, and one or more data retrieving devices for retrieving the data stored on the data 
storage components. A skilled artisan can readily appreciate that any one of the 
currently available computer systems are suitable. 

In one particular embodiment, the computer system 100 includes a processor 105 
connected to a bus which is connected to a main memory 115 (preferably implemented 
25 as RAM) and one or more internal data storage devices 110, such as a hard drive and/or 
other computer readable media having data recorded thereon. In some embodiments, 
the computer system 100 further includes one or more data retrieving device 118 for 
reading the data stored on the internal data storage devices 110. 
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The data retrieving device 118 may represent, for example, a floppy disk drive, a 
compact disk drive, a magnetic tape drive, or a modem capable of connection to a 
remote data storage system (e.g., via the internet) etc. In some embodiments, the 
internal data storage device 110 is a removable computer readable medium such as a 
5 floppy disk, a compact disk, a magnetic tape, etc. containing control logic and/or data 
recorded thereon. The computer system 100 may advantageously include or be 
programmed by appropriate software for reading the control logic and/or the data from 
the data storage component once inserted in the data retrieving device. 

The computer system 100 includes a display 120 which is used to display output 
10 to a computer user. It should also be noted that the computer system 100 can be linked 
to other computer systems 125a-c in a network or wide area network to provide 
centralized access to the computer system 100. 

Software for accessing and processing the coordinate and sequences described 
herein, (such as search tools, compare tools, and modeling tools etc.) may reside in main 
1 5 memory 115 during execution. 

For the first time, the present invention permits the use of molecular design 
techniques to design, select and synthesize novel enzymes, chemical entities and 
compounds, including inhibitory compounds, capable of binding to a polyketide 
synthase polypeptide (e.g., a chalcone synthase polypeptide), in whole or in part. 

One approach enabled by this invention, is to use the structure coordinates as set 
forth in one or more of Accession Nos. 1BI5, 1D6F, 1D61, 1D6H, 1BQ6, 1CML, 
1CHW, 1CGK, 1CGZ, 1EE0, Table 1, Appendix A, Appendix B and Appendix C to 
design new enzymes capable of synthesizing novel and known polyketides. For 
example, polyketide synthases (PKSs) generate molecular diversity in their products by 
utilizing different starter molecules and by varying the final size of the polyketide chain. 
The structural coordinates disclosed herein allow the elucidation of the nature by which 
PKSs achieve starter molecule selectivity and control polyketide chain length. For 
example, by comparing the structure of chalcone synthase, which yields a tetraketide 
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product to 2-pyrone synthases which forms a triketide product the invention 
demonstrated that 2-pyrone synthase maintains a smaller initiation/elongation cavity. 
Accordingly, generation of a chalcone synthase mutant with an active site sterically 
analogous to 2-pyrone synthase results in the synthesis of a polyketide product of a 
5 different size. As discussed more fully below, this invention allows for the strategic 
development and biosynthesis of more diverse polyketides and demonstrates a structural 
basis for control of polyketide chain length in other PKSs. In addition, the structural 
coordinates allow for the development of substrates or binding agents that bind to the 
polypeptide and alter the physical properties of the compounds in different ways, e.g., 
10 solubility. 

In another approach a polyketide synthase polypeptide crystal is probed with 
molecules composed of a variety of different chemical entities to determine optimal sites 
for interaction between candidate binding molecules (e.g., substrates) and the polyketide 
synthase (e.g., chalcone synthase). 

15 In another embodiment, an approach made possible and enabled by this 

invention, is to screen computationally small molecule data bases for chemical entities 
or compounds that can bind in whole, or in part, to a polyketide synthase polypeptide or 
fragment thereof. In this screening, the quality of fit of such entities or compounds to 
the binding site may be judged either by shape complementarity or by estimated 

20 interaction energy. Meng, E. C. et al., J. Comp. Chem., 13:505-524 (1992). 

Because chalcone synthase is a highly representative member of a family of 
polyketide synthase polypeptides, many of which have similar functional activity, the 
structure coordinates of chalcone synthase, or portions thereof, as provided by this 
invention are particularly useful to solve the structure, function or activity of other 
25 crystal forms of polyketide synthase molecules. They may also be used to solve the 
structure of a polyketide synthase or a chalcone synthase mutant. 



One method that maybe employed for this purpose is molecular replacement. In 
this method, the unknown crystal structure, whether it is another polyketide synthase 
crystal form, a polyketide synthase or chalcone synthase mutant, or a polyketide 



WO 02/057418 



PCT/US01/48523 



47 

synthase complexed with a substrate or other molecule, or the crystal of some other 
protein with significant amino acid sequence homology to any polyketide synthase 
polypeptide, may be determined using the structure coordinates as provided in one or 
more of Accession Nos. 1BI5, 1D6F, 1D6I 5 1D6H, 1BQ6, 1CML, 1CHW, 1CGK, 
5 1CGZ 5 1EE0, Table 1, Appendix A, Appendix B or Appendix C. This method will 

provide an accurate structural form for the unknown crystal more quickly and efficiently 
than attempting to determine such information ab initio. 

In addition, in accordance with the present invention, a polyketide synthase or 
chalcone synthase polypeptide mutant maybe crystallized in association or complex 

10 with known polyketide synthase binding agents, substrates, products or inhibitors. The 
crystal structures of a series of such complexes may then be solved by molecular 
replacement and compared with that of wild-type polyketide synthase molecules. 
Potential sites for modification within the synthase molecule may thus be identified. 
This information provides an additional tool for determining the most efficient binding 

15 interactions between a polyketide synthase and a chemical entity, substrate, product or 
compound. 

All of the complexes referred to above may be studied using well-known X-ray 
diffraction techniques and may be refined to 2-3 A resolution X-ray data to an R value 
of about 0.20 or less using computer software, such as X-PLOR (Y ale University, 1992, 
20 distributed by Molecular Simulations, Inc.). See, e.g., Blundel & Johnson, supra; 

Methods in Enzymology, vol. 114 and 1 15, H. W. Wyckoff et aL, eds., Academic Press 
(1985). This information may thus be used to optimize known classes of polyketide 
synthase substrates or binding agents (e.g., inhibitors), and to design and synthesize 
novel classes of polyketide synthases, substrates, and binding agents (e.g., inhibitors). 

25 The design of substrates, compounds or binding agents that bind to or inhibit a 

polyketide synthase polypeptide according to the invention generally involves 
consideration of two factors. First, the substrate, compound or binding agent must be 
capable of physically and structurally associating with a polyketide synthase molecule. 
Non-covalent molecular interactions important in the association of a polyketide 
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synthase with a substrate include hydrogen bonding, van der Waals and hydrophobic 
interactions, and the like. 

Second, the substrate, compound or binding agent must be able to assume a 
conformation that allows it to associate with a polyketide synthase molecule. Although 
5 certain portions of the substrate, compound or binding agent will not directly participate 
in this association, those portions may still influence the overall conformation of the 
molecule. This, in turn, may have a significant impact on potency. Such conformational 
requirements include the overall three-dimensional structure and orientation of the 
chemical entity or compound in relation to all or a portion of the binding site, e.g., active 
10 site or accessory binding site of a polyketide synthase (e.g., a chalcone synthase 
polypeptide), or the spacing between functional groups of a substrate or compound 
comprising several chemical entities that directly interact with a polyketide synthase. 

The potential binding effect of a substrate or chemical compound on a 
polyketide synthase or the activity a newly synthesized or mutated polyketide synthase 

15 might have on a known substrate may be analyzed prior to its actual synthesis and 
testing by the use of computer modeling techniques. For example, if the theoretical 
structure of the given substrate or compound suggests insufficient interaction and 
association between it and a polyketide synthase, synthesis and testing of the compound 
may be obviated. However, if computer modeling indicates a strong interaction, the 

20 molecule may then be tested for its ability to bind to, initiate catalysis or elongation of a 
polyketide by a polyketide synthase. Methods of assaying for polyketide synthase 
activity are known in the art (as identified and discussed herein). Methods for assaying 
the effect of a newly created polyketide synthase or a potential substrate or binding agent 
can be performed in the presence of a known binding agent or polyketide synthase. For 

25 example, the effect of the potential binding agent can be assayed by measuring the 
ability of the potential binding agent to compete with a known substrate. 

A mutagenized synthase, novel synthase, substrate or other binding compound of 
an polyketide synthase may be computationally evaluated and designed by means of a 
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series of steps in which chemical entities or fragments are screened and selected for their 
ability to associate with binding pockets or other areas of the polyketide synthase. 

One skilled in the art may use one of several methods to screen chemical entities 
or fragments for their ability to associate with a polyketide synthase and more 
5 particularly with the individual binding pockets of a chalcone synthase polypeptide. 
This process may begin by visual inspection of, for example, the active site on the 
computer screen based on the coordinates in one or more of Accession Nos. 1BI5, 
1D6F, 1D6I, 1D6H, 1BQ6, 1CML, 1CHW, 1CGK, 1CGZ 1EE0, Table 1, Appendix A, 
Appendix B pr Appendix C. Selected fragments or substrates or chemical entities may 
1 0 then be positioned in a variety of orientations, or docked, within an individual binding 
pocket of a polyketide synthase. Docking may be accomplished using software such as 
Quanta and Sybyl, followed by energy minimization and molecular dynamics with 
standard molecular mechanics forcefields, such as CHARMM and AMBER. 

Specialized computer programs may also assist in the process of selecting 
15 fragments or chemical entities. These include: 

1. GRID (Goodford, P. J., "A Computational Procedure for Determining 
Energetically Favorable Binding Sites on Biologically Important Macromolecules", J. 
Med. Chem., 28:849-857 (1985)). GRID is available from Oxford University, Oxford, 
UK. 

20 2. MCSS (Miranker, A. and M. Karplus, "Functionality Maps of Binding Sites: 

A Multiple Copy Simultaneous Search Method." Proteins: Structure. Function and 
Genetics, 11 :29-34 (1991)). MCSS is available from Molecular Simulations, Burlington, 
Mass. 

3. AUTODOCK (Goodsell, D. S. and A. J. Olsen, "Automated Docking of 
25 Substrates to Proteins by Simulated Annealing", Proteins: Structure. Function, and 

Genetics, 8:195-202 (1990)). AUTODOCK is available from Scripps Research Institute, 
La Jolla, Calif. 

4. DOCK (Kuntz, I. D. et al, "A Geometric Approach to Macromolecule-Ligand 
Interactions", J. Mol. Biol, 161:269-288 (1982)). DOCK is available from University of 

30 California, San Francisco, Calif 
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Once suitable substrates, chemical entities or fragments have been selected, they 
can be assembled into a single polypeptide, compound or binding agent (e.g., an 
inhibitor). Assembly may be performed by visual inspection of the relationship of the 
fragments to each other on the three-dimensional image displayed on a computer screen 
5 in relation to the structure coordinates of the molecules as set forth in one or more of 
Accession Nos. IBB, 1D6F, 1D6I, 1D6H, 1BQ6, 1CML, 1CHW, 1CGK, 1CGZ, 1EE0, 
Table 1, Appendix A, Appendix B or Appendix C. This would be followed by manual 
model building using software such as Quanta or Sybyl. 

Useful programs to aid one of skill in the art in connecting the individual 
10 chemical entities or fragments include: 

1. CAVEAT (Bartlett, P. A. et al, "CAVEAT: A Program to Facilitate the 
Structure-Derived Design of Biologically Active Molecules". In "Molecular 
Recognition in Chemical and Biological Problems", Special Pub., Royal Chem. Soc., 
78, pp. 182-196 (1989)). CAVEAT is available from the University of California, 

15 Berkeley, Calif 

2. 3D Database systems such as MACCS-3D (MDL Information Systems, San 
Leandro, Calif.). This area is reviewed in Martin, Y. C, "3D Database Searching in 
Drug Design", J. Med. Chem., 35:2145-2154 (1992)). 

3. HOOK (available from Molecular Simulations, Burlington, Mass.). 

In addition to the method of building or identifying novel enzymes or a 
polyketide synthase substrate or binding agent in a step-wise fashion one fragment or 
chemical entity at a time as described above, substrates, inhibitors or other polyketide 
synthase interactions may be designed as a whole or "de novo" using either an empty 
active site or optionally including some portion(s) of known substrates, binding agents 
or inhibitors. These methods include: 

1 . LUDI (Bohm, H.-J., "The Computer Program LUDI: A New Method for the 
De Novo Design of Enzyme Inhibitors", J. Comp. Aid. Molec. Design, 6:61-78 (1992)). 
LUDI is available from Biosym Technologies, San Diego, Calif. 

2. LEGEND (Nishibata, Y. and A. Itai, Tetrahedron, 47:8985 (1991)). 
LEGEND is available from Molecular Simulations, Burlington, Mass. 
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3. LeapFrog (available from Tripos Associates, St. Louis, Mo.). 

Other molecular modeling techniques may also be employed in accordance with 
this invention. See, e.g., Cohen, N. C. et al 9 "Molecular Modeling Software and 
Methods for Medicinal Chemistry", J. Med. Chem., 33:883-894 (1990). See also, 
5 Navia, M. A. and M. A. Murcko, "The Use of Structural Information in Drug Design", 
Current Opinions in Structural Biology, 2:202-210 (1992). 

Once a substrate, compound or binding agent has been designed or selected by 
the above methods, the efficiency with which that substrate, or binding agent may bind 
to a polyketide synthase may be tested and optimized by computational evaluation. 

10 A substrate or compound designed or selected as a polyketide binding agent may 

be further computationally optimized so that in its bound state it would preferably lack 
repulsive electrostatic interaction with the target site. Such non-complementary (e.g., 
electrostatic) interactions include repulsive charge-charge, dipole-dipole and charge- 
dipole interactions. Specifically, the sum of all electrostatic interactions between the 

15 binding agent and the polyketide synthase when the binding agent is bound to the 

synthase, preferably make a neutral or favorable contribution to the enthalpy of binding. 

Specific computer software is available in the art to evaluate compound 
deformation energy and electrostatic interaction. Examples of programs designed for 
such uses include: Gaussian 92, revision C (M. J. Frisch, Gaussian, Inc., Pittsburgh, Pa., 

20 1992); AMBER, version 4.0 (P. A. Kollman, University of California at San Francisco, 
1994); QUANTA/CHARMM (Molecular Simulations, Inc., Burlington, Mass. 1994); 
and Insight DZDiscover (Biosysm Technologies Inc., San Diego, Calif., 1994). These 
programs maybe implemented, for example, using a Silicon Graphics workstation, IRIS 
4D/35 or IBM RISC/6000 workstation model 550. Other hardware systems and 

25 software packages will be known to those skilled in the art of which the speed and 
capacity are continually modified 

Once a polyketide synthase, polyketide synthase substrate or polyketide synthase 
binding agent has been selected or designed, as described above, substitutions may then 
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be made in some of its atoms or side groups in order to improve or modify its binding 
properties. Generally, initial substitutions are conservative, e.g., the replacement group 
will have approximately the same size, shape, hydrophobicity and charge as the original 
group. Such substituted chemical compounds may then be analyzed for efficiency of fit 
5 to a polyketide synthase substrate or fit of a modifed substrate to a polyketide synthase 
having a structure defined by the coordinates in one or more of Accession Nos. 1BI5, 
1D6F, 1D6I, 1D6H, 1BQ6, 1CML, 1CHW, 1CGK, 1CGZ, 1EE0, Table 1, Appendix A, 
Appendix B, or Appendix C, by the same computer methods described, above. 

Conserved regions of the polyketide family synthases lend themselves to the 
10 methods and compositions of the invention. For example, pyrone synthase and 
chalcone synthase have conserved residues present within their active sites (as 
described more fully below). Accordingly, modification to the active site of chalcone 
synthase or a chalcone synthase substrate can be extrapolated to other conserved 
members of the polyketide family of synthases such as, for example, pyrone synthase. 

15 Functional fragments of polyketide synthase polypeptides such as, for 

example, fragments of chalcone synthase can be designed based on the crystal 
structure and atomic coordinates described herein. Fragments of a chalcone synthase 
polypeptide and the fragment's corresponding atomic coordinates can be used in the 
modeling described herein. In addition, such fragments may be used to design novel 

20 substrates or modified active sites to create new diverse polyketides. 

In one embodiment of the present invention, the crystal structure and atomic 
coordinates allow for the design of novel polyketide synthases and novel polyketide 
synthase substrates. The development of new polyketide synthases will lead to the 
development a biodiverse repetoir of polyketides for use as antibiotics, anti-cancer 

25 agents, anti-fungal agents and other therapeutic agents as described herein or known 
in the art. In vitro assay systems for production and determination of activity are 
known in the art. For example, antibiotic activities of novel polyketides can be 
measured by any number of anti-microbial techniques currently used in hospitals and 
laboratories. In addition, anticancer activity can be determined by contacting cells 

30 having a cell proliferative disorder with a newly synthesized polyketide and measuring 
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the proliferation or apoptosis of the cells before and after contact with the polyketide. 
Specific examples of apoptosis assays are provided in the following references: 
Lymphocyte: C. J. Li et aL, Science, 268:429-431, 1995; D. Gibellini et aL, Br. J. 
Haematol. 89:24-33, 1995; S. J. Martin et aL, J. Immunol 152:330-42, 1994; C. Terai 
5 et aL, J. Clin Invest. 87:1710-5, 1991; J. Dhein et aL, Nature 373:438-441, 1995; P. 
D. Katsikis et al.,J. Exp. Med. 1815 :2029-2036, 1995; Michael O. Westendorp et aL, 
Nature 375:497, 1995; DeRossi et aL, Virology 198:234-44, 1994. Fibroblasts: EL 
VossbecketaL,Int. J. Cancer 61:92-97, 1995; S. Goruppi et aL, Oncogene 9:1 537- 
44, 1994; A. Fernandez et aL, Oncogene 9:2009-17, 1994; E. A. Harrington et aL, 

10 Embo J. 13:3286-3295, 1994; N. Itoh et aL, J. Biol. Chem. 268:10932-7, 1993. 
Neuronal Cells: G. Melino et aL, Mol. Cell. Biol. 14:6584-6596, 1994; D. M. 
Rosenbaum et al.,Ann. Neurol. 36:864-870, 1994; N. Sato et aL, J. Neurobiol 
25:1227-1234, 1994; G. Ferrari et aL, J. Neurosci. 1516 :2857-2866, 1995; A. K. 
Talley et al.,Mol. Cell Biol. 1585 :2359-2366, 1995; A. K. Talley et aL, MoL and Cell. 

15 Biol. 15:2359-2366, 1995; G. Walkinshaw et aL, J. Clin. Invest. 95:2458-2464, 1995. 
Insect Cells: R. J. Clem et aL, Science 254:1388-90, 1991; N. E. Crook et aL, J. Virol. 
67:2168-74, 1993; S. Rabizadeh et aL, J. Neurochem. 61:2318-21, 1993; M. J. 
Birnbaum et aL, J. Virol 68:2521-8, 1994; R. J. Clem et aL, MoL Cell. Biol. 14:5212- 
5222, (1994). Other assays are well within the ability of those of skill in the art. 

20 Product of novel polyketides or polyketide synthases can be carried out in 

culture. For example, mammalian expression constructs carrying polyketide synthases 
can be introduced into various cell lines such as CHO, 3T3, HL60, Rat-1, or Jurkart 
cells, for example. In addition, SF21 insect cells may be used in which case the 
polyketide synthase gene is expressed using an insect heat shock promotor. 

25 In another embodiment of the present invention, there is provided a method of 

designing a mutant polyketide synthese. The method include comparing a crystal 
structure of a wild type polyketide synthase with the crystal structure of a second 
polyketide synthase and substituting one or more animo acids with the amino acid 
residues at homologous positions in the second polyketide synthase. Invention 

30 methods can guide the required areas or active sites, and second tier interaction 
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residues for synthase activity. Such areas can be mutated to modify one synthase to 
resemble another synthase, thereby allowing production of a product not typically 
synthesized by the wild type enzyme. 

In another embodiment of the present invention, once a novel substrate or 
5 binding agent is developed by the computer methodology discussed above, the invention 
provides a method for determining the ability of the substrate or agent to be acted upon 
by a polyketide synthase. The method includes contacting components comprising the 
substrate or agent and a polyketide synthase polypeptide, or a recombinant cell 
expressing a polyketide synthase polypeptide, under conditions sufficient to allow the 

10 substrate or agent to interact and determining the affect of the agent on the activity of the 
polypeptide. The term "affect", as used herein, encompasses any means by which 
protein activity can be modulated, and includes measuring the interaction of the agent 
with the polyketide synthase molecule by physical means including, for example, 
fluorescence detection of the binding of an agent to the polypeptide. Such agents can 

15 include, for example, polypeptides, peptidomimetics, chemical compounds, small 
molecules, substrates and biologic agents as described herein. Examples of small 
molecules include but are not limited to small peptides or peptide-like molecules. 

Contacting or incubating includes conditions which allow contact between the 
test agent or substrate and a polyketide synthase or modified polyketide synthase 
20 polypeptide or a cell expressing a polyketide synthase or modified polyketide synthase 
polypeptide. Contacting includes in solution and in solid phase. The substrate or test 
agent may optionally be a combinatorial library for screening a plurality of substrates or 
test agents. Agents identified in the method of the invention can be further evaluated by 
chromatography, cloning, sequencing, and the like. 

25 Although methods and materials similar or equivalent to those described 

herein can be used to practice the invention, suitable methods and materials are 
described below. All publications, patent applications, patents and other references 
mentioned herein are incorporated by reference in their entirety. The invention is 
described in greater detail by reference to the following non-limiting examples. 
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EXAMPLES 

Mutagenesis, expression, and purification . Alfalfa CHS2 cDNA (Junghans, 
H., et al , Plant Mol Biol 22:239-253, 1993) was subcloned into pfflS8 plasmid 
vector derived from pET-28a(+) (Novagen). PCR-based mutagenesis using the 
5 QuikChange system (Stratagene) generated the various mutants including C164S, 
Ci 64 D ? H 303 A, H 303 Q 5 H 303 D, H 303 T, N 336 A, N 336 D, N 336 Q, N 336 H, F215S, F215Y and 
F215W. N-teminal His8-tagged CHS was expressed in BL21(DE3) E. coli cells. Cells 
were harvested and lysed by sonication. His-tagged CHS was purified from bacterial 
sonicates using a NI-NTA (Qiagen) column. Thrombin digest removed the His-tag 
10 and the protein was passed over another NI-NTA column and a benzamidine- 
Sepharose (Pharmacia) column. The final purification step used a Superdex 200 
16/60 (Pharmacia) column. 

Crystallization . CHS crystals (wild-type and C164S mutant) were grown by 
vapor diffusion at 4° C in 2 yl drops containing a 1 : 1 mixture of 25 mg/ml protein and 
crystallization buffer (2.2-2.4 M ammonium sulfate and 0.1 M PIPES, pH 6.5) in the 
presence or absence of 5 mM DTT. Prior to freezing at 105° K, crystals were 
stabilized in 40% (v/v) PEG400, 0.1 M PIPES (pH 6.5), and 0.050-0.075 M 
ammonium sulfate. This cryoprotectant was used for heavy atom soaks. Likewise, all 
substrate and product analog complexes were obtained by soaking crystals in 
cryoprotectant containing 10-20 mM of the compound. 

STS from Pinus sylvestris was crystallized using 13-14% PEG 8000, 0.3M 
ammonium acetate, 0.1M HEPES buffer (pH 7.4) at 4° C. Crystals were soaked for 
60 seconds in the same solution plus 10% glycerol. 

STS from Arachis hypogaea was crystallized using 14% PEG 8000, 0.1M 
25 MOPSO buffer (pH 7.0), with 3% ethylene glycol at 4° C. Crystals were soaked for 
30 seconds in the same solution plus 10% ethylene glycol. 
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18xCHS mutant was crystallized using 21% PEG 8000 3 0.3M ammonium 
acetate, 0.1M HEPES buffer (pH 7.5) at 4° C. Crystals were soaked for 60 seconds in 
the same solution plus 10% glycerol. 

Data Collection and Processing . X-ray diffraction data were collected at 105° 
5 K using a DIP2000 imaging plate system (Mac-Science Corporation, Japan) and CuK 
radiation produced by a rotating anode operated at 45 kV and 100 mA and equipped 
with double focusing Pt/Ni coated mirrors. Native CHS crystals belong to space 
group P3 2 21 with unit cell dimensions of a = b = 97.54 A; c = 65.52 A with a single 
monomer per asymmetric unit. Data were indexed and integrated using DENZO 
10 (Otwinowski & Minor, Meth Enzymol 276:307-326, 1997) and scaled with 

SCALEPACK (Otwinowski & Minor, Meth Enzymol. 276:307-326, 1997). The 
heavy atom derivative datasets were scaled against the native dataset with SCALEIT 
(CCP4 Suite: Programs for protein crystallography, Acta Crystallogr. D 50 :760-763, 
1994). 

15 Structure determination . MIRAS was used to solve the structure of native 

CHS using native data set 1 (1.8 A). Initial phasing was performed with derivative 
datasets including reflections to 2.3 A resolution. Heavy atom positions for the 
Hg(OAc)2 derivative were estimated by inspection of difference Patterson maps using 
the program XTALVIEW (McRee, J. Mol Graph 10:44-46, 1992) and initially 

20 refined with MLPHARE (Otwinowski, Z. in CCP4 Proc. 80-88, Daresbury 
Laboratory, Warrington, UK, 1991). Heavy atom positions for the additional 
derivative data sets were determined by difference Fourier analysis using phases 
calculated from the Hg(OAc) 2 data set and the Hg positions. These sites were 
confirmed by inspection of difference Patterson maps. Final refinement of heavy 

25 atom parameters, identification of minor heavy atom binding sites, and phase-angle 
calculations were performed with the program SHARP (de La Fortelle, & Bricogne, 
Meth. Enzymol. 276:472-494, 1997). MIRAS phases were improved and extended to 
1.8 A by solvent flipping using the CCP4 program SOLOMON (Abrahams and Leslie, 
Acta Crystallogr. D 52 :30-42, 1996). 
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Model building and refinement The program O (Jones, et ah, Acta 
Crystallogr. D 49 :148-157, 1993) was used for model building and graphical display 
of the molecules and electron-density maps. The experimental map for the native 1 
dataset at 1.8 A was of high quality and allowed unambiguous modeling of residues 3 
5 to 389. The model was first refined with REFMAC (Murshudov, et al.Acta 

Crystallogr. D 53:240-255, 1997) and ARP (Lamzin and Wilson, Acta Crystallogr. D 
49:129-147, 1993) against the native 1 dataset. This was followed by manual 
adjustments using I2F 0 -F C 1 difference maps. Water molecules introduced by ARP 
were edited using the I2F 0 -F c l and IF 0 -F C 1 maps. A second refinement with SHELX- 

10 97 (Sheldrick & Schneider, Meth. Enzymol. 277:319-343, 1997) was then carried out 
against the native 2 data set to 1 .56 A resolution. Structures of CHS complexed with 
naringenin and resveratrol and the C164S mutant complexed with malonyl- and 
hexanoyl-CoA were obtained using difference Fourier methods and were refined with 
REFMAC and ARP. All structures were checked with PROCHECK (Laskowski, et 

15 al 9 J. Appl Crystallogr. 26:283-291, 1993). 91.3 % of the residues in CHS are in the 
most favored regions of the Ramachandran plot, 8.4% in the additional allowed 
region, and 0.3% in the generously allowed region. 

Three dimensional structure determination and description 

Recombinant alfalfa CHS2 was expressed in E. coli, affinity purified using an 
20 N-terminal poly-His linker, and crystallized. The structure of wild-type CHS was 
determined using multiple isomorphous replacement supplemented with anomalous 
scattering (MIRAS). The final 1 .56 A resolution apoenzyme model of CHS included 
2982 protein atoms and 355 water molecules. In addition, the structures of a series of 
complexes were obtained by difference Fourier analysis. First, a crystal of a mutant 
25 (C164S) was soaked with malonyl-CoA. This mutant retains limited catalytic activity, 
and the resulting acetyl-CoA complex yields insight on the decarboxylation reaction. 
The same mutant was also complexed with hexanoyl-Co A to mimic the structure of a 
linear polyketide-CoA reaction intermediate. Finally, two product analogs, naringenin 
and resveratrol (see Figure 1) were complexed with CHS to provide information on 
30 how the enzyme governs sequential addition of acetates to the coumaroyl moiety and 
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how CHS controls the stereochemistry of the polyketide cyclization reaction. In 
plants, chalcone isomerase rapidly and stereospecifically converts chalcone to 
naringenin ((-)( 2S)-5 5 7,4'-trihydroxyflavanone) through an additional ring closure. 
This reaction also occurs at a slower rate and non-stereospecifically in solution. As 
5 such, naringenin provides a suitable mimic of the CHS reaction product. Finally, 

since STS uses the same substrates as CHS but a different cyclization pathway for the 
biosynthesis of resveratrol, resveratrol was also soaked into CHS to investigate the 
structural features governing cyclization of the same substrates into two different 
products. 

10 CHS functions as a homodimer of two 42 kDa polypeptides. The structure of 

CHS revealed that the enzyme forms a symmetric dimer with each monomer related 
by a 2-fold crystallographic axis (see Figure 2). The dimer interface buries 
approximately 1580 A 2 with interactions occurring along a fairly flat surface. Two 
distinct structural features delineate the ends of this interface. First, the N-terminal 

15 helix of monomer A entwines with the corresponding helix of monomer B. Second, a 
tight loop containing a cis-peptide bond between Meti37 and Prong exposes the 
methionine sidechain as a knob on the monomer surface. Across the interface, Meti37 
protrudes into a hole found in the surface of the adjoining monomer to form part of 
the cyclization pocket. 

20 Each CHS monomer consists of two structural domains. The upper domain 

exhibits an xBxBx pseudo-symmetric motif originally observed in thiolase from 
Saccharomyces cerevisiae (Mathieu, et al, Structure 2:797-808, 1994). The upper 
domains of CHS and thiolase are superimposeable with a r.m.s. deviation of 3.3 A for 
266 equivalent C-atoms. Both enzymes use a cysteine as a nucleophile and shuttle 

25 reaction intermediates via Co A molecules. However, CHS condenses a p-coumaroyl- 
and three malonyl-CoA molecules through an iterative series of reactions, whereas 
thiolase generates two acetyl-CoA molecules from acetoacetyl-CoA and free CoA. 
The drastic structural differences in the lower domain of CHS create a larger active 
site than that of thiolase and provide space for the polyketide reaction intermediates 

30 required for chalcone formation. 
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The CHS homodimer contains two functionally independent active sites. 
Consistent with this information, bound CoA thioesters and product analogs occupy 
both active sites of the homodimer in the CHS complex structures. These structures 
identify the location of the active site at the cleft between the upper and lower 
5 domains of each monomer. Each active site consists almost entirely of residues from 
a single monomer with Meti37 from the adjoining monomer being the only exception. 
There are remarkably few chemically reactive residues in the active site. Four 
residues conserved in all the known CHS-related enzymes (Cysi64, Phe 2 i5, His303, and 
Asns36) define the active site. Cysi64 apparently serves as the nucleophile and as the 
10 attachment site for polyketide intermediates as previously suggested for both CHS and 
STS (Lanz, et al, J. Biol. Chem. 266:9971-9976, 1991). His 30 3 most likely acts as a 
general base during the generation of a nucleophilic thiolate anion from Cysi64 ? since 
the Ny of His 30 3 is within hydrogen bonding distance of the sulfur of Cysi 6 4. Phe 2 i5 
and Asn336 may function in the decarboxylation reaction, as discussed below. 
Topologically, three interconnected cavities intersect with these four residues and 
form the active site architecture of CHS. These cavities include a CoA-binding 
tunnel, a coumaroyl-binding pocket, and a cyclization pocket. 

The CoA-binding tunnel is 16 angstroms long and links the surrounding 
solvent with the buried active site. Binding of the CoA moiety in this tunnel positions 
substrates at the active site, as observed in the C164S mutant (described in greater 
detail below) complexed with malonyl- or hexanoyl-CoA. The conformation of the 
CoA molecules bound to CHS resembles that observed in other CoA binding 
enzymes. The adenosine nucleoside is in the 2'-endo conformation with an anti- 
glycosidic bond torsion angle. At the tunnel entrance, Lysss, Arg 58 , and Lyse2 
hydrogen bond with two phosphates of CoA. Apart from these interactions, and an 
additional hydrogen bond between the backbone amide nitrogen of Ala 30 8 and the first 
carbonyl of the pantetheine moiety, van der Waals contacts dominate the remaining 
interactions between CHS and CoA. The pantetheine arm of the CoA extends into the 
enzyme positioning the terminally bound thioester-linked substrates near Cysi64. 
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Both naringenin and resveratrol bind at the active site end of the CoA-binding 
tunnel. The interactions observed in the naringenin and resveratrol complexes define 
the coumaroyl-binding and cyclization pockets. The space to the lower left of the 
CoA-binding tunnel's end serves as the coumaroyl-binding pocket. Residues of this 
5 pocket (SerBs, Glui 92 , Thri 94 , Thri 97 , and Ser 3 3 8 ) surround the coumaroyl-derived 
portion of the bound naringenin and resveratrol molecules and interact primarily 
through van der Waals contacts. However, the carbonyl oxygen of Gly2i6 hydrogen 
bonds to the phenolic oxygen of both naringenin and resveratrol and the hydroxyl of 
Thri97 interacts with the carbonyl of naringenin derived from coumaroyl-CoA. The 
10 identity of the residues in this pocket likely contributes to the preference for 

coumaroyl-CoA as a substrate for parsley CHS over other cinnamoyl-CoA starter 
molecules, like caffeoyl- or feruloyl-CoA. 

In both the naringenin and resveratrol complexes, the malonyl-derived portion 
of each molecule occupies a large pocket adjacent to Cysl64 suggesting this is where 

15 the polyketide reaction intermediate cyclizes into the new ring system and where 

aromatization of the ring occurs. The six-carbon chain of hexanoyl-CoA also binds in 
this pocket. Physically, the size of the pocket limits the number of acetate additions to 
three. Phe 2 65 separates the coumaroyl-binding site from the cyclization pocket and 
may function as a mobile steric gate during successive rounds of polyketide 

20 elongation. Although a polyketide possesses a number of hydrogen bond acceptors 
through which specific interactions could aid in proper folding for the cyclization 
reaction, the residues of the cyclization pocket, including Thrm, Meti37, Phe2i5 ? He254> 
Gly256 ? Phe^65 ? and Pro375, provide few potential hydrogen bond donors. As in the 
coumaroyl-binding pocket, van der Waals contacts dominate the interaction between 

25 CHS and both naringenin and resveratrol. Thus, the surface topology of the 

cyclization pocket dictates how the malonyl-derived portion of the polyketide is 
folded and how the stereochemistry of the cyclization reaction leading to chalcone 
formation in CHS and resveratrol formation in STS is controlled. 
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Reaction mechanism 

The position of the CoA thioesters and product analogs in the CHS active site 
suggest binding modes for substrates and intermediates in the polyketide elongation 
mechanism that are consistent with the known product specificity of CHS. In 
5 addition, the stereochemical features of the substrate and product analog complexes 
elucidate the roles of Cysi 6 4, Phe 2 i5, His 3 03, and Asn 3 36 in the reaction mechanism. 
Utilizing structural constraints derived from the available complexes, the following 
reaction sequence is proposed (see Figure 6). 

In the mechanism, binding of p-coumaroyl-CoA initiates the CHS reaction. 

10 Functional and structural evidence supports a coumaroyl-first mechanism over a 
malonyl-first one. Cerulenin, a potent irreversible inhibitor of CHS, covalently 
modifies Cys l64 in CHS (Lanz, etal, J. Biol. Chem. 266:9971-9976, 1991). 
Preincubation of CHS with coumaroyl-CoA prevents inactivation by cerulenin, but 
pre-incubation with malonyl-CoA does not (Preisig-Mueller, et al. , Biochemistry 

15 36:8349-8358, 1997). Also, the location of the coumaroyl-derived portion of 
naringenin and resveratrol in the CHS complexes agrees with a coumaroyl first 
mechanism, since the presence of a triketide reaction intermediate attached to Cysi64 
would limit access to the coumaroyl-binding pocket. 

After p-coumaroyl-CoA binds to CHS, Cysi64, activated by His303, attacks the 
20 thioester linkage, transferring the coumaroyl moiety to Cysi64 (Monoketide 

Intermediate). Asn 3 36 hydrogen bonds with the carbonyl oxygen of the thioester 
further stabilizing formation of the tetrahedral reaction intermediate. CoA then 
dissociates from the enzyme, leaving a coumaroyl-thioester at Cysi64- Binding of the 
first malonyl-CoA positions the bridging methylene carbon of the malonyl moiety 
25 near the carbonyl carbon of the covalently attached coumaroyl-thioester. 

Decarboxylation of malonyl-CoA leads to carbanion formation. Resonance between 
the keto and enol species stabilizes the carbanion. Attack of this carbanion on the 
coumaroyl-thioester releases the thiolate anion of Cysi64 and transfers the coumaroyl 
group to the acetyl moiety of the CoA thioester (Diketide CoA Thioester). Capture of 
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this elongated diketide-CoA by Cysi64 and release of Co A sets the stage for two 
additional rounds of elongation resulting in formation of the tetraketide reaction 
intermediate. 

Asns36 appears to play a crucial role in the decarboxylation reaction. 
5 Structural evidence shows that the decarboxylation reaction does not require transfer 
of the malonyl moiety to Cysi64 as originally indicated by CO2 exchange assays. 
Decarboxylation occurs without Cysi64 ? since the C164S mutant produces acetyl-CoA 

as determined crystallographically and confirmed by a functional assay. In the 

* 

hexanoyl-CoA complex, the side chain amide of Asn336 provides a hydrogen bond to 
10 the carbonyl oxygen of the thioester. This interaction would stabilize the enolate 
anion resulting from decarboxylation of malonyl-CoA (see Figure 6). At the same 
time, the lack of formal positive charge at Asn336 may preserve the partial carbanion 
character of this resonance-stabilized anion, and thus the nucleophilicity of the 
carbanion form. 

15 The role of Phe2is in the catalytic mechanism is subtler than that of Asn336- Its 

position in both CoA complexes suggests that it provide van der Waals interactions 
for substrate binding. However, its conservation in bacterial enzymes related to CHS 
that do not make flavonoids or stilbenes may indicate a more general catalytic role for 
Phe2i5- Its position near the acetyl moiety of the malonyl-CoA complex suggests that 

20 it participates in decarboxylation by favoring conversion of the negatively charged 
carboxyl group to a neutral carbon dioxide molecule. 

Figure 7A depicts the addition of the third malonyl-CoA molecule as a three- 
dimensional model. The position of the coumaroyl ring in the modeled triketide 
intermediate is as observed in the naringenin and resveratrol complexes. The 
25 coumaroyl-binding pocket locks this moiety in position, while the acetate units added 
in subsequent chain extension steps bend to fill the cyclization pocket. The backbone 
of bound hexanoyl-CoA provides a guide for modeling the triketide reaction 
intermediate attached to Cysi64. Based on the observed acetyl-CoA complex, a 
rotation of the acetyl group would place the terminal methylene of the decarboxylated 



WO 02/057418 



PCT/US01/48523 



63 

malonyl-CoA in position for nucleophilic attack on the triketide thioester linkage 
resulting in formation of a tetraketide CoA thioester. 

The cyclization reaction catalyzed by CHS is an intramolecular Claisen 
condensation encompassing the three acetate units derived from three malonyl-CoAs. 
5 During cyclization, the nucleophilic methylene group nearest the coumaroyl moiety 
attacks the carbonyl carbon of the thioester linked to Cysi64- Ring closure proceeds 
through an internal proton transfer from the nucleophilic carbon to the carbonyl 
oxygen. Modeling of the tetraketide intermediate in a conformation leading to 
chalcone formation places one of the acidic protons of the nucleophilic carbon (C6) 

10 proximal to the target carbonyl (CI) (see Figure 7B). Since there is no base capable of 
proton abstraction from the tetraketide, it is proposed that the intermediate itself 
provides the driving force for carbanion formation. Protonation of the carbonyl 
oxygen would also stabilize the negative charge on the tetrahedral intermediate. 
Breakdown of this tetrahedral intermediate expels the newly cyclized ring system 

15 from Cysi64. Subsequent aromatization of the trione ring through a second series of 
facile internal proton transfers yields chalcone. 

Although the cyclization reaction has been modeled as occurring via a 
polyketide intermediate attached to Cysi64> it is possible that the reaction proceeds 
when the polyketide is attached to CoA. The rate of cyclization versus the rate of 
20 reattachment to Cysi64 would dictate which of the two cyclization alternatives is 
mechanistically preferred. 

An important question in the biosynthesis of chalcones concerns the 
exchangeability of the polyketide reaction intermediates. In the presence of chalcone 
reductase (CHR), CHS produces 6-deoxychalcone (Welle & Grisebach, FEBS Lett. 
25 236:22-225, 1988). Mechanistically, CHR must reduce a ketone on the polyketide 
intermediate before cyclization occurs. Based on the CHS structure, any polyketide 
attached to Cysi64 would be inaccessible to CHR unless a drastic structural change 
occurs in CHS upon interaction with CHR. While this conformational change is 
possible, such a change is difficult to imagine given the buried nature of the CHS 
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active site. This would argue for the presence of moderately exchangeable polyketide- 
CoA reaction intermediates. Consistent with this idea, a recently identified CHS -like 
enzyme from Pinus strobus involved in the biosynthesis of C-methylated chalcones is 
active only with a starter molecule that is sterically analogous to the diketide-CoA 
5 intermediate postulated to be formed after the first condensation reaction in CHS30. 
These results suggest that the enzymes involved in the biosynthesis of plant 
polyketides may require specific localization in the plant cell to allow efficient 
channeling of intermediates from one enzyme to another during the production of 
particular products. 

10 Cvclization specificity of CHS and STS 

Elucidation of the structure of CHS provided mechanistic insight and active 
site configuration for CHS reaction. Homology modeling and sequence alignments 
suggested evolutionary functional divergence of CHS superfamily (type III PKSs) 
occurs via the the preservation of catalytic residues while using steric variation of 
15 other active site residues. Elucidation of the structure of 2-PS confirms the above 
'steric modulation' model, by revealing substrate and product specificity differences 
achieved by only three active site mutations, as suggested by homology model of 2-PS 
based upon CHS 3D structure. 

However, with these structures alone, the structural cause/determinants of the 
20 alternate cyclization seen in the stilbene synthase (STS) subfamily of CHS-like 

enzymes remained unknown. STS makes the same tetraketide intermediate as CHS, 
but cyclizes it differently (C2->C7 attack instead of C6->C1). STS evolved from CHS 
independently at least three times, with no clear STS consensus sequence. 

Elucidation of the structure of pine {Pinus sylvestris) STS according to the 
25 present invention reveals a similar active site configuration, with minor differences. 
Furthermore, an 18xCHS mutant encompassing observed STS structural backbone 
differences proves to have activity and kinetics similar to STS (see Figure 18), 
confirming that observed structural differences between CHS and STS are relevant to 
mechanistic differences. 
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It was further determined that ten of the eighteen mutations in 1 8xCHS prove 
to be neutral (not related to functional conversion, i.e. an alteration in CHS activity), 
and an 8xCHS mutant with similar STS-like activity is made. All of the 8xCHS 
changes are clustered in a single area, although encoded on three different stretches of 
5 primary sequence (see Figurel9). This area is thus implicated as important for STS- 
like versus CHS-like cyclization. 

Elucidation of the structure of peanut (Arachis hypogaea) STS, as well as of 
the 18xCHS engineered STS, show similar three-dimensional conformational changes 
in the area implicated by the 8xCHS mutagenic conversion of CHS to STS (see Figure 
10 20). This implies that a single 3D solution to the CHS to STS conversion problem has 
been found by all three STS subfamilies, despite variation in primary sequence. A 
compensatory increase in bulkiness at CHS residue 98 seems to be involved in all 
three families of STS. 

A closer look at where the altered region meets the active site (see Figure 22) 
15 reveals a consistent change in STS-like enzymes that suggests a cyclization switch 

mechanism (see Figure 21), involving movement of Thrl32 to allow a hydrogen-bond 
chain to transfer an electron from Glul92, through Thrl32 and a water (bonded to Ser 
338). This electron is proposed to encourage hydrolysis of the tetraketide intermediate 
off of the catalytic cysteine, where decarboxylation of the terminal carboxyl group 
20 drives the STS reaction toward a C2->C7 cyclization. In CHS, this hydrolysis does 
not occur, and so the C6->C1 cyclization is encouraged, as it serves to break the 
thioester bond to cysteine. 

To test this proposed mechanism, various mutations were made in the 18xCHS 
engineered STS enzyme, in an attempt to revert the product specificity back to that of 
25 CHS, without reversing the other structural changes. Single mutations designed to 
disrupt only the hydrogen-bonding character in the relevant region succeeded in 
reverting 18CHS's activity from STS-like to CHS-like. A few of these mutants 
produce almost equal amounts of resveratrol and chalcone, which might be useful 
when engineered into a plant. This way, the beneficial resveratrol antifungal natural 



WO 02/057418 



PCT/US01/48523 



66 

product could be made, without completely abrogating the vital CHS-like activity 
necessary in plants. 

The residue implicated as the crucial base for STS-like behavior (Glul92) is 
not altered in STS. Instead, the adjacent Thrl32 changes positions. As a further test 
5 of the proposed aldol mechanism, the residue equivalent to CHS Glul92 was mutated 
to Gin in both the pine and peanut STS wild type enzymes. As predicted, both of 
these single mutants made more chalcone and less resveratrol than the wild type STS 
enzymes. The ratio of products supports the proposed mechanism. The decrease in 
overall activity of these mutants is due to the fact that Glul92 is also important for 
10 folding and/or stability, apart from its role in cyclization specificity. 

Structural basis for functionally novel CHS-like enzymes 

Absolute conservation of Cysi64, Phe 2 i5, His 30 3, and Asn 336 occurs in CHS-like 
sequences, including several bacterial proteins possessing very low (typically 20-30%) 
amino acid sequence identity. Moreover, all CHS-like proteins exhibit strong 

15 conservation of residues shaping the geometry of the active site. Although the 

functions of the bacterial CHS-like proteins remain unknown, these enzymes likely 
form polyketides or polyketide-CoA thioesters in a manner resembling CHS. 
However, steric differences resulting from sequence variation in both the coumaroyl- 
binding pocket and the cyclization pocket strongly suggest alternate substrate and 

20 product specificity in the bacterial enzymes. 

The sequence databases include approximately 150 plant enzyme sequences 
classified as CHSlike proteins. The substrate and product specificity of a majority of 
these sequences remains to be determined. In addition, the high sequence similarity of 
all plant sequences complicates classification of these sequences as authentic CHS, 
25 STS, ACS, or BBS enzymes. The information provided by the three-dimensional 
structure of CHS should make new substrate and product specificity more readily 
discernible from sequence information. 
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To illustrate the usefulness of structural information in identifying potentially 
new activities, a CHS-related sequence from Gerbera hybrids (GCHS2)32 that is 74% 
identical with alfalfa CHS2 was examined. Modeling the active site architecture of 
GCHS2 using the structure of alfalfa CHS2 as a template indicates that GCHS2 will 
5 not catalyze either the CHS-like or STS-like reaction (see Figure 8). This variation in 
reaction specificity results from striking steric differences in the coumaroyl binding 
and cyclization pockets that substantially reduce the volume of both pockets from 923 
A 3 in CHS to 269 A 3 in GCHS2. Side chain variation at positions 197 and 338 alter 
the coumaroyl binding pocket, while the identity of residue 256 dictates major steric 

10 changes in the cychzation pocket. The reduced size of these pockets in GCHS2 

suggests that fewer than three acetate additions will occur, and that a CoA thioester 
with an acyl moiety smaller than p-coumaroyl initiates the reaction. Recent functional 
characterization of GCHS2 confirms this prediction and demonstrates that this 
enzyme uses acetyl-CoA or benzoyl-CoA and two condensation reactions with 

15 malonyl-CoA to form pyrone products (Eckermann, et ah , Nature 396:397-396, 
1998). 

Crystallization of Additional Polyketide Synthases 

Stilbene synthase from Pinus sylvestris was overexpressed in E. coli as an 
octahistidyl N-terminal fusion protein, purified to >90% homogeneity by metal 

20 affinity and gel filtration chromatography, and crystallized in the preparation lacking 
the N-terminal polyhistidine tag (removed by thrombin cleavage) from 13% (w/v) 
polyethylene glycol (PEG8000), 0.05 M MOPSO, 0.3 M ammonium acetate at pH 7.0. 
This STS is 396 amino acids in length and, like alfalfa CHS exists as a homodimer in 
solution. The structural coordinates of this pine STS are presented in Appendix A. 

25 STS from Arachis hypogaea was similarly expressed and crystallized. The structural 
coordinates of this peanut STS are presented in Appendix B. 



2-Pyrone synthase (2-PS) from Gerbera hybrida was expressed and purified 
from E. coli in a similar manner to CHS and STS. Crystals were obtained from 1 .5 M 
ammonium sulfate, Oil M Na + - succinate, 0.002 M DTT at pH 5.5. 
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2-Pyrone synthase (2-PS) from Gerbera hybrida forms a triketide from an 
acetyl-CoA initiator and two acetyl-CoA a-carbanions derived from decarboxylation 
of two malonyl-CoAs that cyclizes into the 6-methyl-4-hydroxy-2-pyrone. In 
comparison, alfalfa chalcone synthase 2 (CHS2; 74% amino acid sequence identity to 
5 2-PS), condenses jo-coumaroyl-CoA and three acetyl-CoA a-carbanions derived from 
decarboxylation of three malonyl-CoAs into a tetraketide that cyclizes into chalcone. 
A homology model of 2-PS based on the structure of CHS suggested that the 2-PS 
initiation/elongation cavity is smaller than that of CHS. A smaller cavity would 
account for the terminal formation of a triketide intermediate prior to cyclization by 2- 
10 PS. 

Expression, Purification and Crystallization of 2-PS. 

2-PS was expressed in E. coli, purified and crystallized as described above. 
Gerbera hybrida 2-PS was expressed in E. coli using the pHIS8 vector and was 
purified as described for CHS. 2-PS crystals grew at 4 °C in hanging-drops 

15 containing a 1 :1 mixture of 25 mg mH protein and crystallization buffer (1 .5 M 

ammonium sulfate, 50 mM succinic acid (pH 5.5), and 5 mM DTT). Before freezing 
at 105°K, crystals (P3i21; unit cell dimensions a = 82.15 A 5 c = 241.33 A; one 2-PS 
dimer per asymmetric unit) were stepped through stabilizer (50 mM succinic acid (pH 
5.5), 50 mM ammonium sulfate, and 5 mM DTT) containing 5 mM acetoacetyl-CoA 

20 and increasing concentrations of glycerol (30% (v/v) final). Diffraction data were 
collected using a DIP2030 imaging plate system and CuK radiation produced by a 
rotating anode (wavelength 1 .54 A). All images were processed with 
DENZO/SCALEPACK (Z. Otwinowski, W. Minor, Methods Enzymol 276:307 
(1997)). A total of 179,623 reflections were merged to give 60,824 unique reflections 

25 (98.2% complete overall to 2.05 A and 98.1% complete in the highest resolution shell) 
with an R S ym = 0.042 (0.206 in the highest resolution shell) and an a/_of 21.7 (4.5 in 
the highest resolution shell). The structure of 2-PS complexed with acetoacetyl-CoA 
was determined by molecular replacement using CHS as a search model and was 
refined to 2.05 A resolution. The overall fold of 2-PS is the apapa motif found in 
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CHS and p-ketoacyl synthase II (KAS II). In addition, the positions of the catalytic 
residues of 2-PS (Cysi 6 9 ? His 3 os ? and Asn 34 i), CHS (Cysi 63 , His 303 , Asn 336 ), and KAS 
II (Cysi6 3 , His 3 o 35 and His 3 4o) are structurally analogous. As expected from sequence 
homology, the structures of 2-PS and CHS are nearly identical and superimpose with a 
5 r.m.s. deviation of 0.64 A for the two proteins' a-carbon atoms. Similar to CHS, the 
2-PS dimerization surface buries 1 805 of surface area per monomer and a loop 
containing a czs-peptide bond between Meti 4 2 and Proi4 3 allows the methionine of one 
monomer to protrude into the adjoining monomer's active site. Thus, dimerization 
allows formation of the complete 2-PS active site. 

10 Acetoacetyl-CoA is a reaction intermediate of 2-PS. Electron density for the 

ligand is well defined in the 2-PS active site and shows that the acetoacetyl moiety 
extends from the CoA pantetheine arm into a large internal cavity. The electron 
density also reveals oxidation of the catalytic cysteine's (Cysi69) sulfhydryl to sulfinic 
acid (-SO2H). This oxidation state prevents formation of a covalent acetoacetyl- 

15 enzyme complex but allows trapping of the bound acetoacetyl-CoA intermediate. 
Extensive protein-ligand contacts position CoA at the entrance to the active site and 
orient the acetoacetyl moiety at the end of a 1 5 A long tunnel that opens into a cavity 
that defines the initiation and elongation steps of polyketide formation. 

The 2-PS active site cavity consists of twenty-seven residues from one 
20 monomer and Meti 42 from the adjoining monomer. Phe 2 20 and Phe 2 7o mark the 

boundary between the CoA binding site and the initiation/elongation cavity. Near the 
CoA thioester, Cysi69> His 3 og 5 an( * Asn 3 4i form the catalytic center of 2-PS. These 
residues are conserved in all homodimeric iterative PKSs. Based on this, catalytic 
roles were proposed for each residue that are analogous to the corresponding residues 
25 in CHS. Cysi69 acts as the nucleophile in the reaction and as the attachment site for 
the elongating polyketide chain. Interaction between His 3 os and Cysi69 maintains the 
thiolate required for condensation of the starter molecule. His 3 os and Asn 3 4i catalyze 
malonyl-CoA decarboxylation and stabilize the transition states during the 
condensation steps by forming an oxyanion hole that accommodates the negatively 
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charged tetravalent transition state. Following the first condensation reaction, a 
diketide remains attached to Cysi69. The second malonyl-CoA then binds, undergoes 
decarboxylation, and the resulting nucleophilic acetyl-coA cc-carbanion performs a 
second condensation reaction with the enzyme bound diketide, ultimately generating 
5 the triketide that cyclizes into methylpyrone. 

Comparison of the initiation/elongation cavities of 2-PS and CHS reveal four 
amino acid differences. In 2-PS, Leu202, Met 2 59> Leu 2 6i ? and Ile 34 3 replace Thri 97 , 
Ile254 3 Gly256 5 and Ser33g, respectively, of CHS. These four substitutions reduce cavity 
volume from 923 A 3 in CHS to 274 A 3 in 2-PS. A model of methylpyrone in the 2- 

10 PS cavity, based on the position of acetoacetyl-CoA, emphasizes the volume change 
compared to the CHS-naringenin complex (Accession No. 1CGK). Leu 20 2 and Iles43 
occlude the portion of the 2-PS cavity corresponding to the coumaroyl-binding site of 
CHS. Replacement of Gly256 in CHS by Leu 2 6i in 2-PS severely reduces the size of 
the active site cavity. Substitution of Met 2 59 in 2-PS for Ile 2 54 in CHS produces a 

15 modest alteration in cavity volume. To examine the functional importance of these 
amino acid differences, the initiation/elongation cavity of CHS was altered by 
mutagenesis to resemble that of 2-PS. The resulting mutant proteins were screened 
for activity using either p-coumaroyl-CoA or acetyl-CoA as starter molecules. 
Activities of 2-PS, CHS, and the CHS mutants were determined by monitoring 

20 product formation using a TLC-based radiometric assay. Assay conditions were 100 
mM Hepes (pH 7.0), 30 juM starter-CoA (either ^7-coumaroyl-CoA or acetyl-CoA), 

and 60 juM [ 14 C]-malonyl-CoA (50,000 cpm) in 100 Jul at 25 °C. Reactions were 
quenched with 5% acetic acid, extracted with ethyl acetate, and applied to TLC plates 
and developed. Due to the spontaneous cyclization of chalcone into the flavanone 
25 naringenin, activities of CHS are referenced to naringenin formation. 

The x-ray crystal structures of 2-PS and CHS imply that the size of the active 
site cavity limits polyketide length and modulates folding of the polyketide chain. 
Wild-type CHS generates the tetraketide chalcone and 2-PS produces the triketide 
methylpyrone. Likewise, the CHS I254M mutant also yields chalcone. Interestingly, 
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the T197L, G256L, and S3 3 81 mutants do not form chalcone. Crystallographic 
analysis of the G256L and S3 3 81 mutants demonstrates that the substituted side-chains 
adopt conformations similar to the corresponding residues in 2-PS without altering the 
position of the protein backbone. Since the T197L, G256L, and S338I mutants altered 
5 product formation, a CHS triple mutant was generated. Consistent with the proposal 
that cavity volume dictates polyketide length, the T197L/G256L/S338I mutant 
produces only methylpyrone, as confirmed by liquid chromatography /mass 
spectroscopy (LC/MS). LC/MS/MS analysis was performed by the Mass Spectroscopy 
facility of the Scripps Research Institute. Scaled-up assays (2 ml reaction volume) 

10 with the CHS T197L/G256L/S338I mutant and 2-PS were performed. Extracts were 
analyzed on a Hewlett-Packard HP 1 100 MSD single quadrupole mass spectrometer 
coupled to a Zorbax SB-C18 column (5 jum, 2.1 mm x 150 mm). HPLC conditions 
were as follows: gradient system from 0 to 100% methanol in water (each containing 
0.2% acetic acid) within 10 min; flow rate 0.25 ml min" 1 . LC/MS/MS data from both 

15 reactions were identical: 6-methyl-4-hydroxy-2-pyrone, Rt = 5.068 min; [M-H]~ 125; 
[M-H-CO2]" 81. The numbers show m/z values with relative intensities in 
parenthesis. The observed fragmentation matches previously published data. 

In addition, the size of the cavity in 2-PS and CHS confers starter molecule 
specificity. 2-PS accepts acetyl-CoA but does not use ^-coumaroyl-CoA. 

20 Structurally, the constricted 2-PS active site excludes the bulky coumaroyl group. As 
such, incubation of 2-PS in the presence of coumaroyl-CoA and malonyl-CoA yields 
methylpyrone produced from three malonyl-CoA molecules. In comparison, the 
larger initiation/elongation cavity of CHS allows for different sized aliphatic and 
aromatic starter molecules to be used in vitro with varying efficiencies. CHS exhibits 

25 a 230-fold preference for p-coumaroyl-CoA versus acetyl-CoA. Alterations in the 

active site cavity of CHS, affect starter molecule preference. The CHS I254M mutant 
is functionally comparable to wild-type enzyme with a modest reduction in specific 
activity. The T197L and S338I mutants exhibit 10-fold and 3-fold preferences, 
respectively, for coumaroyl-CoA. Moreover, both form a distinct product using 

30 coumaroyl-CoA as a starter molecule. In contrast, the G256L mutant favors acetyl- 
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CoA 3-fold. Like 2-PS, the CHS T197L/G256L/S338I (3x) mutant only accepts 
acetyl-CoA (or malonyl-CoA) as the starter molecule. 

Functional diversity among other homodimeric iterative PKSs, like p~ 
coumaroyltriacetic acid synthase (CTAS), acridone synthase (ACS), and the rppA 
5 protein from Streptomyces griseus, likely results from variations of residues lining the 
initiation/elongation cavity. As demonstrated, positions 197, 256, and 338 distinguish 
between tetraketide products derived from a final Claisen condensation in wild-type 
CHS and triketide products derived from an enolate-directed condensation in the CHS 
triple mutant Although CHS, CTAS, and ACS generate tetraketides, each enzyme 

10 differs in either the cyclization reaction or in the identity of the starter molecule. 
CTAS forms the same enzyme-bound tetraketide as CHS but does not catalyze the 
final cyclization reaction. Comparison of these two enzymes reveals that substitution 
of Thr 197 in CHS with an asparagine in CTAS may prevent the covalently-bound 
tetraketide intermediate from undergoing cyclization into chalcone. ACS uses N- 

15 methylanthranoyl-CoA as a starting substrate to produce the alkaloid acridone. Three 
differences between CHS (Thri 32? Ser^s, and Phe 2 6s) and ACS (Serm, Alai 3 3, and 
Val 2 65> may alter starter molecule specificity. In ACS, these changes likely widen the 
portion of the cavity corresponding to the ^-coumaroyl-binding site in CHS to 
accommodate N-methylanthranoyl-CoA binding. Comparative changes in the active 

20 site cavity allow formation of longer polyketides. The rppA protein forms a • 

pentaketide from five acetates derived from malonyl-CoA decarboxylation. Thri37, 
Alai38> Thri99, Leu 202 , Met 25 9, Leu 26 i, Leu 2 68 ? Pro304, and Ile 3 43 of 2-PS are replaced by 
Cysio6, Thrio?, Cysi 6 8, Cy&nu Ile 228 , Tyr 23 o, Phe 237 , Ala 26 i, and Ala 295 , respectively, in 
the rppA protein. Models of the rppA protein based on the 2-PS and CHS structures 

25 show that cavity volume is 1 145 A3 in the rppA protein versus 274 in 2-PS (or 
923 A in CHS). Manipulation of the active site through amino acid substitutions 
offers a strategy for increasing the molecular diversity of polyketide formation through 
both the choice of starter molecule and the number of subsequent condensation steps. 

The reaction mechanism for polyketide formation and the structural basis for 
30 controlling polyketide length described here may be shared with other more complex 
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iterative (e.g., actinorhodin (act) PKS and tetracenomycin (tern) PKS) and modular 
PKSs (e.g., 6-deoxyerythronolide B synthase (DEBS)). The structural similarity of 
the 2-PS, CHS, and KAS II active sites, the sequence homology of KAS II and the 
ketosynthases of act PKS, tcm PKS, and DEBS, and mutagenesis studies of CHS and 
5 act PKS demonstrating similar roles for the catalytic residues of each protein indicate 
that a conserved active site architecture catalyzes similar reactions in these enzymes. 

As in 2-PS and CHS, the volume of the active site cavities in other PKSs 
likely limits the size of the final polyketide. For example, act PKS and tcm PKS 
generate octaketide and decaketide products, respectively, at a single active site. This 

10 suggests that the active site cavities of these PKSs differ in size, and are larger than 
those of 2-PS or CHS. Similarly, the ketosynthases of different DEBS modules accept 
polyketide intermediates ranging in length from five to twelve carbons. Modular 
PKSs, like DEBS, use an assembly-line system in which an individual module 
catalyzes one elongation reaction and passes the growing polyketide to the next 

15 module. Although the ketosynthase domains of DEBS are functionally permissive, 
modulation of active site volume in each module's ketosynthase would provide 
selectivity for the proper sized intermediate at each elongation step. Structural 
differences among PKSs alter the volume of the initiation/elongation cavity to allow 
discrimination between starter molecules and to vary the number of elongation steps 

20 to ultimately direct the nature and length of the polyketide product. 

Functional Conversion of Chalcone Synthase to Stilbene Synthase 

All CHS-like enzymes utilize a small number of absolutely conserved catalytic 
residues within a single active site to catalyze the iterative addition of acetate units to 
a starter molecule. A chalcone synthase reaction sequence starts with initiation, is 
25 followed by elongation, and ends with cyclization (see Figure 10). CHS family 
members differ in their choice of starter molecule, number of acetyl additions and 
cyclization pathway of the resulting polyketide. Structural and functional 
characterization of CHS from M. sativa suggested that substrate specificity is 
modulated in the chalcone synthase superfamily by steric constraints. Such 
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constraints are provided by a set of variable residues lining the active site. Functional 
conversion through mutagenesis of alfalfa CHS to a pyrone synthase, and the 
structural characterization of pyrone synthase (PS) from G. hybrida (daisy) support 
this model. Thus, homology modeling is a valid approach to gain insight into the 
5 specificity's of chalcone synthase superfamily members, including members that are 
identified and/or characterized as well as those still to be identified and characterized. 

Stilbene synthase (STS) is related to CHS, and is thought to have arisen from 
CHS on at least three independent occasions. An amino acid sequence alignment of 
P. sylvestris STS and M. sativa CHS, along with an evolutionary intermediate, P. 

10 sylvestris CHS shows amino acid sequence homology (Figure 1 1). Both CHS and 
STS form the same linear phenylpropanoid tetraketide intermediate via the sequential 
condensation of three acetyl units derived from decarboxylation of malonyl-CoA with 
one coumaroyl-CoA starter (Figure 12). STS forms resveratrol via an intramolecular 
aldol condensation. In contrast, CHS utilizes an intramolecular Claisen condensation 

15 to produce chalcone (Figure 13). 

Function conversion is achieved by mutations of CHS. Mutation of M. sativa 
(alfalfa) CHS confers wild type STS activity resulting in an STS-like product profile 
from mutant CHS activity. Specifically, alfalfa wild type CHS activity results in the 
production of the plant polyketide synthase product, naringenin, a flavanone product 
20 resulting from spontaneous ring closure of chalcone product. Mutant CHS activity 
results in the production of resveratrol, an expected product of wild type STS activity, 
and a decrease in the production of naringenin (see Figure 14). 

Based on the structural information, a variety of mutant CHS molecules can be 
designed. Mutant CHS enzymes can vary with respect to starter preference, activity, 
25 product formation, and the like. Various CHS mutants as shown in Table 3 above 
were designed by invention methods and prepared, and were tested for activity. 

Mutant CHS has altered activity based on assays conducted with 14 Cmalonyl- 
CoA. Products were extracted with ethyl acetate and analyzed by silica gel thin layer 
chromatography (TLC) and visualized by autoradiography. Mutants 14B and 2B 
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showed reduced amounts of naringenin compared to wild type CHS and little or no 
resveratrol. Mutants 16B, 4B, 6B ? 18x and 22x showed reduced amounts of 
naringenin compared to wild type CHS and various amounts of resveratrol. Mutants 
1 8xCHS and 22xCHS showed the lowest naringenin amounts and the highest 
5 resveratrol amounts, in fact, in 22x the naringenintresveratrol ratio is similar to that 
seen with wild type STS from P. sylvestris. 

Specific mutations in 18xCHS by area are as follows, with areas underlined 
showing residue changes especially important for altering activity: Al : D96A, V98L, 
V99A , V100M; A2: T131S. S133T. G134T. V135R M137L; A3: Y157V, M158& 
10 M159V, Y16QF, Q165H; A4: L268K, K269G, D270A, G273D. 

The 22x mutant consists of 18xCHS plus four additional mutations in area Bl 5 
which flanks A4 ? and bridges the gap between A1-A3 and A4 (see Figure 16). The 
22xCHS has decreased naringenin production (relative to 18xCHS) 5 matching 
identically the product profile of wild type STS. These mutations are in an area 
15 predicted to be important for cyclization specificity, due to changes seen here in 
comparing the CHS/resveratrol complex structure to apo and other complexes of 
CHS. Note that final mutation is only two residues before the first change in A4 
region. 

Specific mutations in 22xCHS by area are as follows, with areas underlined 
20 showing residue changes especially important for altering activity: Al : D96A 5 V98L, 
V99A,V100M; A2: T131S. S133T. G134T. V135P , M137L: A3: Yl 57V, Ml 58a 
M159V, Y160R Q165H; A4: L268K, K269G, D270A, G273D; Bl: D255G, H257K, 
L258V, H266Q. 



25 



The crystal structural coordinates of the 18xCHS mutant are presented in 
Appendix C. Table 6 shows the relative active site a-carbon coordinates of the 
18xCHS mutant possessing STS-like activity. 
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TABLE 6 



Active Site a- 
Carbon Number 


X Position 


Y Position 


Z Position 


Amino Acid 


1 


3.754 


-8.620 


C O /111 

58.411 


Thr 132 


2 


0.541 


-10.075 


59.960 


Thr 133 


3 


0.228 


-9.423 


49.613 


Met 137* 


4 


0.230 


-7.076 


55.634 


(jrln 161 


5 


9.260 


-15.931 


1 1 ylO 

61.148 


Thr 194 


6 


6.542 


-18.097 


57.263 


«-f-ii 1 C\'~l 

Thr 197 


7 


13,288 


-17.295 


ri O O O 

51.888 


Crly 211 


8 


1^1 


-13. /J 1 


OU.joD 


my z 1 o 


9 


6.827 


-10.404 


45.169 


He 254 


10 


2.304 


-13.379 


49.664 


Gly 256 


11 


1.944 


-17.210 


54.954 


Leu 263 


12 


5.520 


- 16.124 


49.059 


Phe 265 


13 


8.197 


-14.531 


42.889 


Leu 267 


14 


11.540 


-7.480 


56.987 


Ser338 


15 


8.611 


-9.306 


62.954 


Glu 192 



Met 137 from the second monomer 



Table 7 shows the wild type CHS active site positions that differ from the 
coordinates listed in Table 6. The unlisted positions are equivalent for both CHS-like 
5 Claissen and STS-like aldol cyclization specificity. 



TABLE 7 



Active Site a- 
Carbon Number 


X Position 


Y Position 


Z Position 


Amino Acid 


1 


4.033 


-8.884 


58.744 


Thr 132 


2 


3.656 


-11.697 


61.297 


Ser 133 



Table 8 shows various amino acid positions where mutations thereof can 
enable or enhance STS-like activity in CHS mutants. The a-carbon positions are 
those observed in the 18xCHS crystal sturcture. The comparison of crystal structure 
10 may identify further positions that produce similar results. 
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TABLE 8 



Enabling ex- 
Carbon Number 


Position 


V 
I 

Position 


Position 


xVlULaUOn 


Location 
Designation 


1 




- 14.634 


67.063 


V98L 


Al 


2 


-0 144 

w • x p* r 


-13.492 


69.602 


V99A 


Al 


J 




-13 818 


72.285 


V100M 


Al 


4 


4 117 


- 6.516 


61.579 


S131T 


A2 




0 541 


-1 0 075 


59.960 


T133S 


A2 


\j 


-1 599 


-9 886 


63.127 


G134T 


A2 


7 
/ 


-3 665 


-12.840 


64.483 


V135P 


A2 


o 


0 228 


-9.423 


49.613 


M137L* 


A2 


Q 


-1 7^5 


-0.801 


63.145 


M158G 


A3 


10 


-0.401 


-5.049 


58.793 


Y160F 


A3 


11 


3.525 


-11.762 


46.471 


D255G 


Bl 


12 


-0.844 


-15.289 


50.586 


H257K 


Bl 


13 


-2.269 


-15.735 


54.104 


L258V 


Bl 


14 


5.803 


-16.354 


45.249 


H266Q 


Bl 


15 


8.069 


-13.510 


39.218 


L268K 


A4 


16 


10.985 


-12.040 


37.288 


K269G 


A4 


17 


14.223 


-10.808 


38.865 


D270A 


A4 



These results show that a function conversion of CHS to STS can be achieved 
by designing mutations in the CHS sequence based on CHS structural information. 
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While the foregoing has been presented with reference to particular 
20 embodiments of the invention, it will be appreciated by those skilled in the art that 

changes in these embodiments may be made without departing from the principles and 
spirit of the invention, the scope of which is defined by the appended claims. 
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Appendix A - Pinus svlvestris STS 



ATOM 


# 


TYPE 


RES 






X 




Y 




Z 


occ 


E 


i 




ATOM 


1 


CB 


ASP 


A 


5 


15. 


,478 


-29. 


.459 


49. 


168 


1. 


00 


67 . 


43 


A 


ATOM 


2 


CG 


ASP 


A 


5 


16. 


008 


-30. 


,062 


47. 


877 


1. 


00 


68. 


10 


A 


ATOM 


3 


OD1 


ASP 


A 


5 


17. 


184 


-30. 


, 480 


47 . 


850 


1. 


00 


68 . 


26 


A 


ATOM 


4 


OD2 


ASP 


A 


5 


15. 


.247 


-30. 


.116 


46. 


890 


1. 


00 


68 . 


59 


A 


ATOM 


5 


C 


ASP 


A 


5 


16. 


056 


-27. 


.113 


48. 


532 


1. 


00 


65. 


16 


A 


ATOM 


6 


O 


ASP 


A 


5 


17. 


, 024 


-2 6. 


.582 


47 . 


995 


1. 


00 


64. 


39 


A 


ATOM 


7 


N 


ASP 


A 


5 


15. 


729 


-27. 


,703 


50. 


902 


1. 


00 


67. 


43 


A 


ATOM 


8 


CA 


ASP 


A 


5 


16. 


237 


-28, 


,193 


49. 


588 


1. 


00 


66. 


83 


A 


ATOM 


9 


N 


PHE 


A 


6 


14. 


, 800 


-26. 


,782 


48. 


261 


1. 


00 


64 . 


20 


A 


ATOM 


10 


CA 


PHE 


A 


6 


14. 


, 453 


-25. 


.779 


47. 


266 


1. 


00 


63. 


13 


A 


ATOM 


11 


CB 


PHE 


A 


6 


12. 


, 938 


-25. 


.783 


47 . 


053 


1. 


00 


63. 


60 


A 


ATOM 


12 


CG 


PHE 


A 


6 


12. 


.353 


-27. 


.157 


46. 


885 


1. 


00 


64 . 


51 


A 


ATOM 


13 


CD1 


PHE 


A 


6 


11. 


,522 


-27. 


. 695 


47. 


866 


1. 


00 


66. 


17 


A 


ATOM 


14 


CD2 


PHE 


A 


6 


12. 


, 642 


-27. 


.923 


45. 


7 60 


1. 


00 


63. 


90 


A 


ATOM 


15 


CE1 


PHE 


A 


6 


10. 


,987 


-28. 


. 976 


47 . 


729 


1. 


00 


65. 


63 


A 


ATOM 


16 


CE2 


PHE 


A 


6 


12. 


,113 


-29. 


,206 


45. 
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71 


A 


ATOM 


2909 


CA 


SER 


A 


384 


33. 


221 


47. 


205 


89. 


305 


1. 


00 


17. 


18 


A 


ATOM 


2910 


CB 


SER 


A 


384 


34. 


737 


47 . 


429 


89. 


301 


1. 


00 


16. 


78 


A 


ATOM 


2911 


OG 


SER 


A 


384 


35. 


377 


46. 


504 


88 . 


434 


1. 


00 


19. 


20 


A 


ATOM 


2912 


C 


SER 


A 


384 


32. 


704 


47. 


244 


87. 


873 


1. 


00 


16. 


69 


A 


ATOM 


2913 


O 


SER 


A 


384 


32. 


259 


46. 


231 


87 . 


333 


1. 


00 


14. 


89 


A 


ATOM 


2914 


N 


VAL 


A 


385 


32. 


763 


48. 


422 


87. 


266 


1. 


00 


16. 


95 


A 


ATOM 


2915 


CA 


VAL 


A 


385 


32. 


318 


48. 


602 


85. 


892 


1. 


00 


18. 


01 


A 


ATOM 


2916 


CB 


VAL 


A 


385 


31. 


210 


49. 


679 


85. 


807 


1. 


00 


19. 


03 


A 


ATOM 


2917 


CGI 


VAL 


A 


385 


30. 


892 


49. 


986 


84. 


356 


1. 


00 


19.61 


A 


ATOM 


2918 


CG2 


VAL 


A 


385 


29. 


968 


49. 


200 


86. 


538 


1. 


00 


19. 


12 


A 
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2924 


C 


ALA 


A 


386 


34. 
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50. 


.237 


82. 


459 


1 


.00 


20 


.60 


A 


ATOM 


2925 


0 


ALA 


A 


386 


33. 


.573 


50. 


. 698 


82. 


038 


1 


.00 


19 


. 64 


A 


ATOM 


2926 


N 


ILE 


A 


387 


35. 


,757 


50. 


,949 


82. 


530 


1 


.00 


20 


.08 


A 


ATOM 


2927 


CA 


ILE 


A 


387 


35. 


.832 


52. 


,344 


82. 


093 


1 


. 00 


21 


.90 


A 


ATOM 


2928 


CB 


ILE 


A 


387 


35. 


.864 


53. 


.324 


83. 


293 


1 


.00 


20 


.45 


A 


ATOM 


2929 


CG2 


ILE 


A 


387 


■ 34. 


.507 


53. 


.353 


83. 


977 


1 


.00 


20 


.07 


A 


ATOM 


2930 


CGI 


ILE 


A 


387 


36, 


,969 


52. 


,923 


84 . 


274 


1 


.00 


20 


.21 


A 


ATOM 


2931 


CD1 


ILE 


A 


387 


37. 


.124 


53. 


,867 


85. 


451 


1 


. 00 


17 


.62 


A 


ATOM 


2932 


C 


ILE 


A 


387 


37. 


.076 


52. 


,591 


81. 


233 


1 


.00 


23 


.48 


A 


ATOM 


2933 


O 


ILE 


A 


387 


37. 


.190 


53. 


.699 


80. 


664 


1 


.00 


24 


.34 


A 


ATOM 


2934 


OXT 


ILE 


A 


387 


37. 


,929 


51. 


,679 
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1 


.00 


24 


.25 


A 
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Appendix C - 18xCHS Mutant 



ATOM 


# 


TYPE 


RES 






X 




Y 




25 


occ 


B 




ATOM 


1 


CB 


VAL 


A 


2 


-13. 


230 


29 


.022 


69. 


882 


1. 


00 


30. 61 


A 


ATOM 


2 


CGI 


VAL 


A 


2 


-12. 


890 


29 


.579 


71. 


256 


1. 


00 


31.32 


A 


ATOM 


3 


CG2 


VAL 


A 


2 


-13. 


703 


27 


.583 


69. 


999 


1. 


00 


31.29 


A 


ATOM 


4 


C 


VAL 


A 


2 


-14. 


560 


29 


.365 


67. 


801 


1. 


00 


29.09 


A 


ATOM 


5 


O 


VAL 


A 


2 


-15. 


501 


28 


.610 


67. 


557 


1. 


00 


29. 96 


A 


ATOM 


6 


N 


VAL 


A 


2 


-15.591 


29 


.845 


70. 


002 


1. 


00 


30.39 


A 


ATOM 


7 


CA 


VAL 


A 


2 


-14. 


326 


29 


.883 


69. 


216 


1. 


00 


29.93 


A 


ATOM 


8 


N 


SER 


A 


3 


-13. 


700 


29 


.774 


66. 


873 


1. 


00 


27.49 


A 


ATOM 


9 


CA 


SER 


A 


3 


-13. 


814 


29 


.352 


65. 


482 


1. 


00 


25.81 


A 


ATOM 


10 


CB 


SER 


A 


3 


-13. 


481 


30 


.514 


64. 


548 


1. 


00 


25,39 


A 


ATOM 


11 


OG 


SER 


A 


3 


-12. 


104 


30 


.840 


64. 


623 


1. 


00 


24.75 


A 


ATOM 


12 


C 


SER 


A 


3 


-12. 


866 


28 


.195 


65. 


190 


1. 


00 


25.08 


A 


ATOM 


13 


O 


SER 


A 


3 


-11. 


910 


27 


.961 


65. 


931 


1. 


00 


24.70 


A 


ATOM 


14 


N 


VAL 


A 


4 


-13. 


134 


27 


. 478 


64. 


102 


1. 


00 


24.03 


A 


ATOM 


15 


CA 


VAL 


A 


4 


-12. 


298 


26 


.352 


63. 


704 


1. 


00 


23.31 


A 


ATOM 


16 


CB 


VAL 


A 


4 


-12. 


904 


25 


.609 


62. 


491 


1. 


00 


23.50 


A 


ATOM 


17 


CGI 


VAL 


A 


4 


-11. 


986 


24 


.474 


62. 


058 


1. 


00 


23.11 


A 


ATOM 


18 


CG2 


VAL 


A 


4 


-14. 


275 


25 


.069 


62. 


848 


1. 


00 


23.81 


A 


ATOM 


19 


C 


VAL 


A 


4 


-10. 


895 


26 


.833 


63. 


338 


1. 


00 


22.89 


A 


ATOM 


20 


O 


VAL 


A 


4 


-9. 


910 


26 


.129 


63. 


557 


1. 


00 


22.82 


A 


ATOM 


21 


N 


SER 


A 


5 


-10. 


813 


28 


.037 


62. 


778 


1. 


00 


22.60 


A 


ATOM 


22 


CA 


SER 


A 


5 


-9. 


529 


28 


. 613 


62. 


383 


1. 


00 


22.06 


A 


ATOM 


23 


CB 


SER 


A 


5 


-9. 


742 


29 


.969 


61. 


704 


1. 


00 


21.99 


A 


ATOM 


24 


OG 


SER 


A 


5 


-8. 


505 


30 


.545 


61. 


320 


1. 


00 


22.10 


A 


ATOM 


25 


C 


SER 


A 


5 


-8. 


610 


28 


.788 


63. 


587 


1 . 


00 


21.94 


A 


ATOM 


26 


O 


SER 


A 


5 


-7. 


435 


28 


.423 


63. 


542 


1. 


00 


22.12 


A 


ATOM 


27 


N 


GLU 


A 


6 


-9. 


151 


29 


.345 


64. 


665 


1. 


00 


21. 69 


A 


ATOM 


28 


CA 


GLU 


A 


6 


-8. 


372 


29 


.572 


65. 


875 


1 . 


00 


21.71 


A 


ATOM 


29 


CB 


GLU 


A 


6 


-9. 


195 


30 


.387 


66. 


879 


1. 


00 


23.91 


A 


ATOM 


30 


CG 


GLU 


A 


6 


-8. 


390 


30 


.969 


68. 


040 


1. 


00 


28.22 


A 


ATOM 


31 


CD 


GLU 


A 


6 


-7 . 


384 


32 


.032 


67. 


608 


1. 


00 


30.05 


A 


ATOM 


32 


OE1 


GLU 


A 


6 


-6. 


670 


32 


.566 


68. 


486 


1. 


00 


31.96 


A 


ATOM 


33 


OE2 


GLU 


A 


6 


-7. 


302 


32 


.340 


66. 


399 


1. 


00 


32.23 


A 


ATOM 


34 


C 


GLU 


A 


6 


-7. 


945 


28 


.234 


66. 


488 


1. 


00 


20.55 


A 


ATOM 


35 


O 


GLU 


A 


6 


-6. 


842 


28 


.109 


67. 


019 


1. 


00 


19.81 


A 


ATOM 


36 


N 


ILE 


A 


7 


-8. 


820 


27 


.235 


66. 


402 


1. 


00 


18.95 


A 


ATOM 


37 


CA 


ILE 


A 


7 


-8. 


522 


25 


.909 


66. 


937 


1 . 


00 


17.56 


A 


ATOM 


38 


CB 


ILE 


A 


7 


-9. 


766 


24 


. 987 


66. 


864 


1. 


00 


17.72 


A 


ATOM 


39 


CG2 


ILE 


A 


7 


-9. 


396 


23 


.560 


67. 


269 


1. 


00 


17.42 


A 


ATOM 


40 


CGI 


ILE 


A 


7 


-10. 


863 


25 


.532 


67. 


784 


1. 


00 


17.63 


A 


ATOM 


41 


CD1 


ILE 


A 


7 


-12. 


178 


24 


.790 


67 . 


693 


1. 


00 


17.98 


A 


ATOM 


42 


C 


ILE 


A 


7 


-7. 


367 


25 


.263 


66. 


169 


1 . 


00 


16.69 


A 


ATOM 


43 


O 


ILE 


A 


7 


-6. 


415 


24 


.767 


66. 


773 


1. 


00 


16.31 


A 


ATOM 


44 


N 


ARG 


A 


8 


-7. 


450 


25 


.283 


64. 


839 


1. 


00 


15.59 


A 


ATOM 


45 


CA 


ARG 


A 


8 


-6. 


411 


24 


.697 


63. 


984 


1. 


00 


15.15 


A 


ATOM 


46 


CB 


ARG 


A 


8 


-6. 


803 


24 


.836 


62. 


506 


1. 


00 


14.73 


A 


ATOM 


47 


CG 


ARG 


A 


8 


• -5. 


829 


24 


.202 


61. 


507 


1. 


00 


13.64 


A 


ATOM 


48 


CD 


ARG 


A 


8 • 


-6. 


090 


22 


.708 


61. 


297 


1. 


00 


13.58 


A 


ATOM 


49 


NE 


ARG 


A 


8 


-5. 


526 


21 


.844 


62. 


336 


1. 


00 


12.35 


A 


ATOM 


50 


CZ 


ARG 


A 


8 


-4. 


284 


21 


.358 


62. 


326 


1. 


00 


13.34 


A 


ATOM 


51 


NH1 


ARG 


A 


8 


-3. 


453 


21 


.644 


61. 


327 


1. 


00 


12. 64 


A 


ATOM 


52 


NH2 


ARG 


A 


8 


-3. 


870 


20 


.578 


63. 


319 


1. 


00 


12.49 


A 


ATOM 


53 


C 


ARG 


A 


8 


-5. 


048 


25 


.359 


64. 


211 


1. 


00 


15.17 


A 


ATOM 


54 


O 


ARG 


A 


8 


-4. 


027 


24 


. 678 


64. 


286 


1. 


00 


14.78 


A 


ATOM 


55 


N 


LYS 


A 


9 


-5. 


034 


26 


.687 


64. 


320 


1. 


00 


15.05 


A 


ATOM 


56 


CA 


LYS 


A 


9 


-3. 


781 


27 


.418 


64. 


528 


1. 


00 


15.25 


A 


ATOM 


57 


CB 


LYS 


A 


9 


-4. 


039 


28 


.932 


64. 


545 


1. 


00 


14. 94 


A 


ATOM 


58 


CG 


LYS 


A 


9 


-4. 


332 


29 


.532 


63. 


178 


1 . 


00 


15.21 


A 


ATOM 


59 


CD 


LYS 


A 


9 


-4. 


522 


31 


.041 


63. 


280 


1. 


00 


16.42 


A 


ATOM 


60 


CE 


LYS 


A 


9 


-4. 


697 


31 


.688 


61. 


910 


1. 


00 


16. 68 


A 


ATOM 


61 


NZ 


LYS 


A 


9 


-4. 


801 


33 


.172 


62. 


035 


1. 


00 


17.12 


A 


ATOM 


62 


C 


LYS 


A 


9 


-3. 


037 


27 


.028 


65. 


799 


1. 


00 


15.06 


A 
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ATOM 


63 


O 


LYS 


A 


9 


-1 , 


, 804 


27 


. 007 


65 . 


822 


1 . 


00 


15 . 


26 


A 


ATOM 


64 


N 


ALA 


A 


10 


-3 . 


,789 


2 6 


. 718 


66 . 


852 


1 . 


00 


15 . 


12 


A 


ATOM 


65 


CA 


ALA 


A 


10 


-3 . 


,206 


2 6 


.34 9 


68 . 


141 


1 . 


00 


14 . 


75 


A 


ATOM 


66 


CB 


ALA 


A 


10 


-4 . 


, 121 


Z 6 


. 830 


69 . 


ZD/ 


1 . 


00 


15 . 


94 


A 


ATOM 


67 


C 


ALA 


A 


10 


-2 . 


. 97 9 


2 4 


.848 


68 . 


O /CO 
ZOO 


1 . 


00 


1 A 

14 . 


ezr\ 

oU 


A 


ATOM 


68 


O 


ALA 


A 


10 


-2 . 


, 392 


2 4 


.38 0 


69 . 


2 4 8 


1 . 


00 


14 . 


85 


A 


ATOM 


69 


N 


GLN 


A 


11 


-3 . 


,42 6 


O A 

Z 4 


.0 99 


67 . 


Zb 1 


1 . 


00 






A 


ATOM 


70 


CA 


GLN 


A 


11 


-3 . 


,308 


22 


. 64 4 


67 . 


278 


1 . 


00 


13 . 


50 


A 


ATOM 


7 1 


CB 


GLN 


A 


11 


-4 . 


, 54 9 


22 


. 038 


66 . 


608 


1 . 


00 


12 . 


Q "1 


A 


ATOM 


72 


CG 


GLN 


A 


11 


-4 . 


, 852 


20 


.58 8 


66 . 


973 


1 . 


00 


13 . 


A 1 

41 


A 


ATOM 


73 


CD 


GLN 


A 


11 


-6 . 


, 114 


Z\J 


. 07 6 


66 . 


1 Q O 

z y z 


1 . 


00 


13 . 


1 Z 


A 


ATOM 


7 4 


OE1 


GLN 


A 


11 


-6 . 


101 


1 9 


.74 6 


65 . 


106 


1 . 


00 


14 . 


06 


A 


ATOM 


75 


NE2 


GLN 


A 


11 


-7 . 


214 


zu 


. 031 


67 . 


n o c 
U 6 o 


1 . 


00 




o o 

oz 


A 


ATOM 


76 


C 


GLN 


A 


11 


-2 . 


048 


22 


. 072 


66 . 


611 


1 . 


00 


13 . 


38 


A 


ATOM 


77 


O 


GLN 


A 


11 


-1 . 


550 


21 


. 025 


67 . 


032 


1 . 


00 


13 . 


66 


A 


ATOM 


78 


N 


ARG 


A 


12 


-1 . 


531 


22 


. 752 


65 . 


589 


1 . 


00 


13 . 


00 


A 


ATOM 


79 


CA 


ARG 


A 


12 


-0 . 


, 364 


22 


. 256 


64 . 


850 


1 . 


00 


12 . 


85 


A 


ATOM 


80 


CB 


ARG 


A 


12 


-0 . 


, 338 


22 


. 881 


63 . 


4 42 


1 . 


00 


12 . 


22 


A 


ATOM 


81 


CG 


ARG 


A 


12 


-0 . 


, 209 


24 


. 409 


63 . 


409 


1 . 


00 


12 . 


52 


A 


ATOM 


82 


CD 


ARG 


A 


12 


0 . 


, 264 


24 


. 8 92 


62 . 


036 


1 . 


00 


13 . 


75 


A 


ATOM 


83 


NE 


ARG 


A 


12 


-0 . 


672 


24 


. 5 61 


60 . 


957 


1 . 


00 


14 . 


03 


A 


ATOM 


84 


CS 


ARG 


A 


12 


-1 . 


757 


25 


. 271 


60 . 


657 


1 . 


00 


14 . 


40 


A 


ATOM 


85 


NH1 


ARG 


A 


12 


-2 . 


052 


2 6 


. 3 64 


61 . 


353 


1 . 


00 


14 . 


54 


A 


ATOM 


8 6 


NH2 


ARG 


A 


12 


-2 . 


54 9 


2 4 


. 8 92 


5 9 . 


659 


1 . 


00 


13 . 


01 


A 


ATOM 


87 


C 


ARG 


A 


12 


1 . 


, 022 


22 


. 425 


65 . 


4 8 9 


1 . 


00 


13 . 


09 


A 


ATOM 


88 


O 


ARG 


A 


12 


1 . 


,24 6 


23 


.329 


66 . 


296 


1 . 


00 


13 . 


33 


A 


ATOM 


8 9 


N 


ALA 


A 


13 


1 . 


, 950 


21 


. 541 


65 . 


116 


1 . 


00 


13 . 


02 


A 


ATOM 


90 


CA 


ALA 


A 


13 


3 . 


, 333 


21 


. 606 


65 . 


607 


1 . 


00 


13 . 


42 


A 


ATOM 


91 


CB 


ALA 


A 


13 


3 . 


973 


20 


. 214 


65 . 


580 


1 . 


00 


13 . 


07 


A 


ATOM 


92 


C 


ALA 


A 


13 


4 . 


078 


22 


. 552 


64 . 


657 


1 . 


00 


14 . 


07 


A 


ATOM 


93 


O 


ALA 


A 


13 


3 . 


503 


22 


. 988 


63 . 


662 


1 . 


00 


13 . 


68 


A 


ATOM 


94 


N 


GLU 


A 


14 


5 . 


342 


22 


. 870 


64 . 


938 


1 . 


00 


15 . 


31 


A 


ATOM 


95 


CA 


GLU 


A 


14 


6 . 


072 


23 


.788 


64 . 


055 


1 . 


00 


16 . 


79 


A 


ATOM 


96 


CB 


GLU 


A 


14 


6 . 


634 


24 


. 983 


64 . 


844 


1 . 


00 


19 . 


19 


A 


ATOM 


97 


CG 


GLU 


A 


14 


7 . 


664 


25 


.800 


64 . 


043 


1 . 


00 


22 . 


25 


A 


ATOM 


98 


CD 


GLU 


A 


14 


7 . 


8 66 


27 


. 218 


64 . 


5 65 


1 . 


00 


24 . 


73 


A 


ATOM 


99 


OE1 


GLU 


A 


14 


8 . 


977 


27 


.7 66 


64 . 


379 


1 . 


00 


2 6 . 


12 


A 


ATOM 


100 


OE2 


GLU 


A 


14 


6 . 


916 


2 7 


. 794 


65 . 


14 4 


1 . 


00 


25 . 


41 


A 


ATOM 


101 


C 


GLU 


A 


14 


7 . 


190 


23 


. 22 6 


63 . 


180 


1 . 


00 


16 . 


32 


A 


ATOM 


102 


O 


GLU 


A 


14 


7 . 


17 0 


Z J 


.420 


61 . 


9 64 


1 . 


00 


1 6 . 


A (Z 

4 6 


A 


ATOM 


103 


N 


GLY 


A 


15 


8 . 


1 62 


22 


. 54 8 


63 . 


7 87 


1 . 


00 


15 . 


23 


A 


ATOM 


104 


CA 


GLY 


A 


15 


9 . 
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That which is claimed is: 

1 . A method of producing a mutant polyketide synthase comprising: 

(a) comparing a crystal structure of a wild type polyketide synthase with a 
crystal structure of a second polyketide synthase; 

(b) substituting one or more amino acids of the wild type! polyketide 
synthase with the amino acid residues at homologous positions in the 
second polyketide synthase; and 

(c) producing said mutant polyketide synthase. 

2. The method of claim 1, wherein the wild type polyketide synthase 
comprises at least fourteen active site a-carbons having the structural coordinates of 
Table 1. 

3. The method of claim 2, wherein said one or more amino acids to be 
substituted are selected from the group consisting of positions 96, 98, 99, 100, 131, 
133, 134, 135, 137, 157, 158, 159, 160, 165, 255, 257, 258, 266, 268, 269, 270 and 
273. 

4. The method of claim 3, wherein one or more substitutions are selected 
from the group consisting of D96A, V98L, V99A, V100M, T131S, S133T, G134T, 
V135P, M137L, Y157V, M158G, M159V, Y160F, Q165H, D255G, H257K, L258V, 
H266Q, L268K, K269G, D270A and G273D. 

5. The method of claim 3, wherein said one or more amino acids 
comprise substitutions at positions 98, 131, 133, 134, 135 and 137. 

6. The method of claim 5, wherein the substitutions comprise V98L, 
T131S, S133T, G134T, V135P, andM137L. 

7. The method of claim 5, wherein said one or more amino acids further 
comprise substitutions at positions 96, 99 and 100. 
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8. The method of claim 6, wherein the substitutions further comprise 
D96A, V99A and V100M. 

9. The method of claim 5, wherein said one or more amino acids further 
comprise substitutions at positions 158 and 160. 

10. The method of claim 6, wherein the substitutions further comprise 
M158G andY160F. 

1 1 . The method of claim 7, wherein said one or more amino acids further 
comprise substitutions at positions 158, 160 and 269. 

12. The method of claim 8, wherein the substitutions further comprise 
M158G, Y160F and K269G. 

13. The method of claim 9, wherein said one or more amino acids further 
comprise substitutions at positions 157, 159 and 165. 

14. The method of claim 10, wherein the substitutions further comprise 
Y157V, M159V and Q165H. 

15. The method of claim 11, wherein said one or more amino acids further 
comprise substitutions at positions 157, 159, 165, 268, 270 and 273. 

16. The method of claim 12, wherein the substitutions further comprise 
Y157V, M159V, Q165H, L268K, D270A and G273D. 

17. The method of claim 15, wherein said one or more amino acids further 
comprise substitutions at positions 255, 257, 258 and 266. 

18. The method of claim 1 6, wherein the substitutions further comprise 
D255G, H257K, L258V and H266Q. 

19. The method of claim 1, wherein said wild type polyketide synthase is a 
chalcone synthase. 
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20. The method of claim 1, wherein said second polyketide synthase is a 
stilbene synthase. 

21. The method of claim 1, wherein said wild type polyketide synthase is a 
chalcone synthase and wherein said second polyketide synthase is a stilbene synthase. 

22. The method of claim 1 , wherein said mutant polyketide synthase is 
produced in vitro. 

23. The method of claim 1, wherein said mutant polyketide synthase is 
produced in vivo. 

24. The method of claim 23, wherein said mutant polyketide synthase is 
produced in a plant. 

25. A method of producing a mutant polyketide synthase, said method 
comprising: 

expressing a mutant polyketide synthase created by substituting one or more 
amino acids of a wild type polyketide synthase with the amino acid residues at 
homologous positions of a second polyketide synthase, wherein said amino acid 
residues are selected by comparing a crystal structure of the wild type polyketide 
synthase with a crystal structure of the second polyketide synthase. 

26. A method of producing a mutant polyketide synthase, said method 
comprising: 

synthesizing a mutant polyketide synthase created by substituting one or more 
amino acids of a wild type polyketide synthase with the amino acid residues at 
homologous positions of a second polyketide synthase, wherein said amino acid 
residues are selected by comparing a crystal structure of the wild type polyketide 
synthase with a crystal structure of the second polyketide synthase. 
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27. An isolated polyketide synthase comprising SEQ ID NO: 1 , wherein 
one or more amino acid residues are modified at one or more positions selected from 
the group consisting of positions 96, 98, 99, 100, 131, 133, 134, 135, 137, 157, 158, 
159, 160, 165, 255, 257, 258, 266, 268, 269, 270 and 273. 

28. The synthase according to claim 27, wherein said modifications are 
selected from the group consisting of D96A, V98L, V99A, V100M, T131S, S133T, 
G134T, V135P, M137L, Y157V, M158G, M159V, Y160F, Q165H, D255G, H257K, 
L258V, H266Q, L268K, K269G, D270A and G273D. 

29. The synthase according to claim 27, wherein said one or more amino 
acids comprise modifications at positions 98, 131, 133, 134, 135 and 137. 

30. The synthase according to claim 29, wherein the modifications 
comprise V98L, T131S, S133T, G134T, V135P, andM137L. 

31. The synthase according to claim 29, wherein said one or more amino 
acids further comprise modifications at positions 96, 99 and 100. 

32. The synthase according to claim 30, wherein the modifications further 
comprise D96A, V99A and V100M. 

33. The synthase according to claim 29, wherein said one or more amino 
acids further comprise modifications at positions 158 and 160. 

34. The synthase according to claim 30, wherein the modifications further 
comprise M158G and Y160F. 

35. The synthase according to claim 31, wherein said one or more amino 
acids further comprise modifications at positions 158, 160 and 269. 

36. The synthase according to claim 32, wherein the modifications further 
comprise M158G, Y160F and K269G. 
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37. The synthase according to claim 33, wherein said one or more amino 
acids further comprise modifications at positions 157, 159 and 165. 

38. The synthase according to claim 34, wherein the modifications further 
comprise Y157V, M159V and Q165H. 

39. The synthase according to claim 35, wherein said one or more amino 
acids further comprise modifications at positions 157, 159, 165, 268, 270 and 273. 

40. The synthase according to claim 36, wherein the modifications further 
comprise Y157V, M159V, Q165H, L268K, D270A and G273D. 

41 . The synthase according to claim 39, wherein said one or more amino 
acids further comprise modifications at positions 255, 257, 258 and 266. 

42. The synthase according to claim 40, wherein the modifications further 
comprise D255G, H257K, L258V and H266Q. 

43 . A crystalline form of the synthase of claim 27. 

44. A crystalline form of the synthase of claim 28. 

45. A nucleic acid encoding the synthase of claim 27. 

46. A nucleic acid encoding the synthase of claim 28. 

47. A method of altering the substrate specificity of a polyketide synthase 
comprising: 

(a) comparing a crystal structure of a wild type polyketide synthase with a 
crystal structure of a second polyketide synthase; and 

(b) substituting one or more amino acids in the active site of the wild type 
polyketide synthase with the amino acid residues at homologous 
positions in the second polyketide synthase. 
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48. The method of claim 47, wherein the wild type polyketide synthase 
comprises at least fourteen active site oc-carbons having the structural coordinates of 
Table L 

49. The method of claim 48, wherein the said one or more amino acids to 
be substituted are selected from the group consisting of positions 132, 133, 137, 161, 
194, 197, 211, 216, 254, 256, 263, 265, 267 and 338. 

50. A method of altering the activity of a polyketide synthase comprising: 

(a) comparing a crystal structure of a wild type polyketide synthase with a 
crystal structure of a second polyketide synthase; and 

(b) substituting one or more amino acids of the wild type polyketide 
synthase with the amino acid residues at homologous positions in the 
second polyketide synthase. 

5 1 . The method of claim 50, wherein the wild type polyketide synthase 
comprises at least fourteen active site a-carbons having the structural coordinates of 
Table 1. 

52. The method of claim 51, wherein said one or more amino acids to be 
substituted are selected from the group consisting of positions 96, 98, 99, 100, 131, 
133, 134, 135, 137, 157, 158, 159, 160, 165, 255, 257, 258, 266, 268, 269, 270 and 
273. 

53. The method of claim 52, wherein one or more substitutions are selected 
from the group consisting of D96A, V98L, V99A, V100M, T131S, S133T, G134T, 
V135P, M137L, Y157V, M158G, M159V, Y160F, Q165H, D255G, H257K, L258V, 
H266Q, L268K, K269G, D270A and G273D. 

54. The method of claim 50, wherein said wild type polyketide synthase is 
a chalcone synthase. 
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55. The method of claim 50, wherein said second polyketide synthase is a 
stilbene synthase. 

56. The method of claim 50, wherein said wild type polyketide synthase is 
a chalcone synthase and wherein said second polyketide synthase is a stilbene 
synthase. 

57. The method of claim 50, wherein the altered activity results in the 
formation of the product of the second polyketide synthase instead of the product of 
the wild type polyketide synthase. 

58. The method of claim 50, wherein the altered activity results in the 
formation of both the product of the second polyketide synthase and the product of the 
wild type polyketide synthase. 

59. The method of claim 56, wherein the altered activity results in the 
formation of resveratrol instead of chalcone. 

60. The method of claim 56, wherein the altered activity results in the 
formation of both resveratrol and chalcone. 

61 . A method for altering the polyketide content of a plant by introducing 
the nucleic acid of claim 45. 

62. A method for altering the polyketide content of a plant by introducing 
the nucleic acid of claim 46. 

63. The method of claim 61, wherein said polyketide is resveratrol. 

64. The method of claim 62, wherein said polyketide is resveratrol. 

65. A computer program on a computer readable medium, said computer 
program comprising instructions to cause a computer to: 
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(a) define two different polyketide synthases or fragments thereof based 
on two sets of atomic coordinates derived from crystals of said two 
different polyketide synthases; and 

(b) compare the structure of said two different polyketide synthases. 

66. The computer program of claim 65, wherein at least one set of atomic 
coordinates are as set forth in PDB Accession No. 1BI5, PDB Accession No. 1D6F, 
PDB Accession No. 1D6I, PDB Accession N0.ID6H, PDB Accession No. 1BQ6, PDB 
Accession No.lCML, PDB Accession No. 1CHW, PDB Accession No. 1CGK, PDB 
Accession No. 1CGZ, PDB Accession No.lEEO, Table 1 5 Appendix A, Appendix B, 
Appendix C, or portions thereof. 
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(Both views are from the CoA-Binding Tunnel) 




